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If,  indeed,  the  ultimate  aim  of  a  computing  network  is  resource 
sharing,  then  the  human  component  as  -well  as  the  technical  component 
of  networking  must  be  fully  investigated  to  achieve  this  goal.   This 
research  is  a  first  step  toward  assisting  the  user  in  participating 
in  the  vast  store  of  resources  available  on  a  network.  Analytical, 
simulation  and  statistical  performance  evaluation  tools  are  employed 
to  investigate  the  feasibility  of  a  dynamic  response  time  monitor  that 
is  capable  of  providing  comparative  response  time  information  for  users 
wishing  to  process  various  computing  applications  at  some  network  computing 
node.   In  particular,  the  following  areas  are  investigated: 

1.  The  measurement  and  statistical  analysis  of 
response  times  of  individual  time-sharing  systems 
on  a  computing  network. 

2.  The  comparison  of  response  times  of  these  same 
time-sharing  systems  as  they  process  a  set  of 
benchmark  jobs. 

3-   The  development  of  a  single  analytical  and  a  single 
simulation  model  able  to  explain  and  predict 
response  times  for  all  time -sharing  systems  under 
investigation . 

k.      The  effect  of  heavy  network  traffic  on  the  comparative 
response  times  of  the  individual  time-sharing  systems. 
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The  research  clearly  reveals  that  sufficient  system  data 
is  currently  obtainable,  at  least  for  the  five  diverse  ARPA  network 
systems  studied  in  detail,  to  describe  and  predict  response  time  for 
network  time-sharing  systems  as  it  depends  on  some  measure  of  system 
busyness  or  load  level. 
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1.    INTRODUCTION 

Less  than  a  decade  ago,  the  time-sharing  concept  on  single 
computer  systems  was  one  of  the  main  objects  of  computer  science  inquiry. 
There  existed  a  wide  divergence  of  opinion  on  such  issues  as  where  the 
technology  stood,  key  application  possibilities,  feasibility,  future 
directions  and  economics.   Today  the  resource  sharing  concept  on  networks 
of  computer  systems  has  moved  into  the  spotlight  and  become  the  object 
of  identical  kinds  of  inquiries. 

1.1.   Computer  Network  Evaluation  Trends 

Although  the  computer  network  concept  developed  in  an 
unrevolutionary  manner,  proceeding  logically  and  in  an  orderly  way  from 
the  development  of  highly  sophisticated  single  processor  systems,  the 
performance  evaluation  techniques  developed  for  single  processor  systems 
differ  radically  from  those  developed  for  geographically  distributed 
multiple  processor  computer  systems.   Performance  evaluation  in  single 
processor  systems  is  characterized  by  a  hodge-podge  of  performance  goals 
and  performance  measurements.  The  most  significant  convergence  of  thought 
among  single  processor  systems'  analysts  is  agreement  that  what  is 
required  is  a  quantitative  methodology  on  which  to  base  analysis  of  real 
system  data  for  model  formulation  and  validation.   Performance  evaluation 
in  networks,  on  the  other  hand,  where  it  has  been  present,  has  been 
characterized  by  a  careful  development  of  analytic  and  simulation  network 


models,  generally  supported  "by  data  analyzed  using  optimization  and 
statistical  techniques.   These  evaluation  techniques,  as  well  as  their 
specific  application  in  existing  or  proposed  networks  are  surveyed 
elsewhere  [MAM71!-]  . 

An  examination  of  existing  models  and  measurements  in  computer 
network  systems  reveals  several  trends.  Analysis  "based  on  queueing 
theory  has  "been  anchored  in  a  node-by-node  approach,  assuming  independence 
of  the  various  network  nodes.  This  approach  works  very  satisfactorily 
for  a  limited  set  of  network  phenomena.   Simulation  has  been  successfully 
used,  but  can  become  prohibitively  expensive  when  detailed  representations 
of  the  network  system  are  required  [KLE7O,  SAL73,  WAR39J •   Optimization 
techniques  have  been  effectively  transferred  from  network  flow  theory 
and  are  working  well  to  yield  specific  design  parameters  [WHI72] .  Actual 
system  measurements,  analyzed  using  statistical  techniques  and  used  to 
improve  queueing  and  simulation  models,  have  been  relatively  neglected 
[C0L7l] •   (This  neglect  may  be  due  in  part  to  the  unavailability  of  tools 
for  making  desired  observations  of  dynamic  systems  and  of  statistically 
significant  test  environments.)  Finally,  although  sophisticated  performance 
evaluation  tools  are  generally  available,  they  have  been  applied  almost 
solely  to  the  ARPA  (Advanced  Research  Projects  Agency)  network. 

Not  the  least  important  among  the  recent  trends  in  computer 
network  performance  evaluation  is  research  aimed  at  aiding  the  user  in 
optimizing  job  routing  and  scheduling,  and  minimizing  job  cost.   This 
trend  has  been  spurred  by  a  relatively  stable  network  technology,  coupled 
with  an  ever  increasing  number  of  general  network  users.  From  their 
embryonic  days  of  the  late  1960's  until  just  recently,  computer  networks 


have  been  a  subject  of  interest  mainly  to  universities  and  research 
agencies.  As  late  as  January  of  1973,  the  AREA  network  [R0B70]  statistics 
were  showing  that  even  though  the  network  was  reliable  and  available, 
communication  lines  were  used  3*5  percent  on  the  average  [MCQ73J •  Also 
in  1973,  the  MERIT  network  [HER72]  found  itself  in  serious  financial 
difficulties  due  to  lack  of  interest  by  a  sufficient  number  of  users. 
However,  over  the  last  year,  very  substantial  interest  has  been  materializing 
in  the  wider  university  and  research  communities  and  in  the  commercial 
world  as  well . 

The  Distributed  Computer  System  network  concept  developed 
by  D.  Farber  [FAR72]  at  the  University  of  California  at  Irvine  has  been 
a  significant  exception  to  the  common  mode  of  development  of  network  systems 
which  provides  inter-connected  computer  resources  but  requires  users  to 
do  their  own  unadvised  job  scheduling.   The  host  sites  on  Farber' s  ring- 
structured  network  send  bids  for  jobs  back  to  the  customers,  thus  pro- 
viding them  with  some  criteria  by  which  to  choose  a  particular  hpst  for 
job  processing.   The  majority  of  operational  networks,  though,  do  not 
provide  the  user  with  formal  information  on  comparative  job  costs  or 
comparative  job  run  times. 

Marshall  Abrams  at  the  National  Bureau  of  Standards  should  also 
be  mentioned  here  as  another  unique  contributor  to  user-oriented  network 
performance  evaluation  research.  He  has  developed  a  "stimulus-acknowledg- 
ment-response" model  to  describe  the  user-computer  interaction  and  a 
data  acquisition  system  called  the  Network  Measurement  Machine.  He  is 
using  these  tools  to  analyze  network  performance  as  perceived  by  a 
network  user  or  the  "consumer  of  computer  services"  [ARB7*+J  • 


1.2.   Computer  Network  Evaluation  Deficiencies;  A  Problem  Statement 

There  exists  a  need,  then,  for  network  performance  evaluation 

efforts  to  be  geared  toward  aiding  the  network  user  in  the  decision -making 

inherent  in  network  interactions.  Network  designers  and  managers  have 

been  the  fortunate  recipients  of  analytical,  simulation  and  statistical 

tools  useful  in  carrying  out  their  network  duties.  These  same  tools  of 

the  network  performance  analysts  must  also  be  applied  to  answer  questions 

of  importance  to  the  user . 

While  cost-effectiveness  is  an  important  performance  factor, 

response  time  is  often  the  primary  performance  parameter  of  interest  to 

users  and,  in  particular,  interactive  or  time-sharing  system  response 

time .  Given  a  choice  of  different  interactive  computing  systems  with 

varying  capabilities  for  handling  particular  types  of  computer  applications, 

network  users  need  to  be  advised  of  the  comparative  turnaround  or  response 

times  of  those  systems. 

More  specifically,  for  a  given  network  facility,  let  the  system 

environment  for  a  user  at  a  particular  time,  t,  be  described  by  the  set 

[i,j,ki(t),Ti(s,j)J,  where 

i        is  one  of  a  set  of  n  time- sharing  computing 

systems  accessible  from  the  facility  (presumably 
n  is  a  constant  over  reasonably  short  periods 
of  time), 

j       is  one  of  a  set  of  m  computing  applications 
required  by  the  user  (presumably  m  is  a 
constant  over  reasonably  short  periods  of  time), 

k. (t)    is  the  load  level  on  the  i   computer  system 


1 


at  time  t  (for  convenience,  k.(t)  is  partitioned 

•4-  V. 

at  the  i   facility  into  ten  equal  length 
intervals),  and 


T.(s,j)  (called  "response  time")  is  the  time  required 
at  load  s,  where  s  =  k.(t)  at  some  time  t,  to 
complete  the  execution  of  a  run  command  for  the 

j   application  at  the  i   facility. 

Within  this  system  environment,  answers  to  the  following  questions 
must  he  provided: 

(1)  For  some  particular  system  i,  is  it  possible  to 
describe  and  predict  the  behavior  of  T.(s,j)  as  s 
varies  with  time?   (Discussed  in  section  3«3»1«) 

(2)  At  some  time  t,  is  it  possible  to  meaningfully 

th 

compare  T.(s,j)  for  a  particular  i   computing 

application  when  run  at  m  different  time -sharing 
computing  facilities?   (Discussed  in  section  3«3«2.) 

(3)  Is  there  a  single  response  time  model  (analytical, 
simulation  or  statistical)  that  will  describe  and 
predict  T.(s,j)  for  each  i  and  each  j  with  an 
acceptable  level  of  accuracy?   (Discussed  in  sections 
k.k.l.   -  k.k.3.) 

(k)     What  is  the  effect  of  network  traffic  on  T.(s,j)? 
(Discussed  in  section  4.U.U.) 

If  the  first  three  of  these  questions  above  can  be  answered  affirmatively, 
then  it  will  be  feasible  to  develop  a  dynamic  response  time  monitor  that  users 
can  query  to  gain  up-to-the-minute,  on-line,  comparative  response  time  data 
for  a  particular  computing  application  to  be  run  on  one  of  a  set  of 
network  time-sharing  facilities. 
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1.3.   Time -Sharing  System  Evaluation 

The  research  required  to  answer  the  response  time  queries 
of  the  network  user  cuts  across  two  distinct,  but  related  areas — comparison 
of  independent  computing  systems  and  the  investigation  of  response  time 
parameters  in  time-shared  systems.  The  work  done  to  compare  systems  is 
sparse.   One  significant  comparative  study  of  computing  machines  has  been 
published  and  one  is  presently  in  process.  K.  E.  Knight  has  compared 
the  performance  capabilities  for  318  general  purpose  computer  systems 
in  terms  of  computing  power  and  cost  [KNI66,  KNI68] .  His  measurements 
spanned  the  evaluation  of  machines  from  l^hk-l^Gj   and  distinguished 
machine  capabilities  in  performing  "scientific"  computations  from  those 
in  performing  "commercial"  computations.  P.  A.  Alsberg  from  the 
University  of  Illinois  at  Urbana-Champaign  has  directed  research  aimed  at 
producing  comparative  data  for  machine  cost-effectiveness  as  it  is 
measured  across  six  interactive  computing  systems  performing  four  different 
types  of  work--(l)  numerical,  (2)  console,  (3)  input/output,  and  (k) 
bit/byte  manipulation.  All  of  the  six  computing  systems  either  are  on  or 
will  be  added  to  the  ARPA  network.  A  third  comparative  study  of  computing 
systems  was  performed  by  P.  E.  Jackson  and  his  associates  [FUC70,  JAC69] • 
This  work  will  be  discussed  below. 

Extensive  measurements  and  performance  evaluations  of  response 
time  in  time-sharing  systems  have  been  reported  by  several  independent 
researchers.  Kleinrock  [KLE72]  has  produced  a  survey  of  these  performance 
studies,  with  an  emphasis  on  analytical  results.  Studies  based  mainly 
on  system  measurements  rather  than  analytical  models  have  also  resulted 
in  important  contributions  to  the  field. 


A.  L.  Scherr  [SCH67]  who  analyzed  a  large  set  of  measurements 
taken  on  the  MIT  Project  MAC  Compatible  Time-Shared  System  (CTSS) 
concluded  from  his  work:  that  only  mean  think  time,  mean  processor  time 
and  the  number  of  users  interacting  with  the  system  are  of  first-order 
effect  in  describing  system  behavior.  R.  A.  Totschek's  contribution 
[TOT65]  resulting  from  his  study  of  the  SDC  Q-32  system  was  characterized 
by  the  classification  of  many  of  the  empirical  distributions  associated 
with  interactive  usage  as  having  density  functions  with  long  slowly 
decreasing  tails  and  standard  deviations  exceeding  the  mean  value . 

Jackson  and  Stubbs  [JAC69]  studied  a  number  of  time-shared 
systems  and  determined  average  values  for  a  variety  of  measurements 
relevant  to  interactive  systems--think  time,  idle  time,  response  time  and 
so  on.  Later  Jackson  along  with  Fuchs  [FUC70]  estimated  the  distribution 
of  many  of  these  random  variables.   This  study  reiterated  Totschek's 
finding  in  that  Jackson  and  Fuchs  found  that  for  all  the  continuous 
random  variables,  the  gamma  distribution  was  an  excellent  fit  and  that  the 
parameter  in  the  gamma  distribution  ranged  between  1.0  and  1.8.  At  1.0 
the  distribution  becomes  exponential,  and  even  at  1.8  its  tail  is  still 
definitely  exponential. 

The  essential  elements  of  the  research  methodologies  associated 
with  comparing  computer  systems  and  those  associated  with  describing  the 
behavior  of  time- shared  systems  can  be  abstracted  from  the  work  reported 
above.  The  comparative  system  studies  are  characterized  by  (l)  running 
benchmark  jobs  with  specified  properties  and  (2)  measuring  well-defined 
quantities  obtainable  from  all  of  the  machines  involved.   The  time-shared 
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studies  also  have  two  essential  characteristics:   (l)  the  conception 

of  all  interactions  with  the  time- shared  system  (compiles,  edit  commands, 

run  commands,  etc.)  as  being  of  equal  significance  and  the  measurement 

of 'them  as  such,  and  (2)  the  development  of  models  to  describe  and  predict 

system  behavior. 

The  research  methodology  required  to  compare  response  times  for 
different  job  applications  on  different  machines  is  similar  to  that 
methodology  already  used  in  comparing  systems,  but  somewhat  different  from 
the  methodologies  used  to  date  in  studying  response  time  data.  Since 
comparative  results  are  required,  running  jobs  with  identical  characteristics 
on  each  system  and  measuring  well-defined  quantities  obtainable  from  each 
system  is  an  appropriate  and  useful  procedure.  On  the  other  hand,  while 
former  studies  on  time- shared  systems  considered  all  interactions  to  be  of 
equal  importance  and  measured  and  modeled  under  this  assumption,  we  are 
concerned  here  only  with  job  execution  interactions.  Furthermore,  our 
concern  is  with  run  command  response  time  measurements  and  models  for 
specific  computing  applications . 


2.    COMPARATIVE  RESPONSE  TIMES  ON  THE  ARPA  NETWORK 

The  task  of  providing  the  network  user  with  information  to 
facilitate  decisions  concerning  job  routing  must  "be  accomplished  within 
the  framework  of  the  present  network  technology.   Theoretically,  such 
aids  may  be  as  sophisticated  as  a  "black  box"  environment  in  which  users 
need  merely  indicate  the  type  of  job  and  special  resources  required  and 
jobs  are  automatically  scheduled  to  run  with  minimum  response  time.   Given 
the  present  configuration  of  even  the  most  advanced  networks,  however, 
a  scheme  best  able  to  be  readily  implemented  would  be  one  in  which  the 
network  interface  processors  contained  sufficient  information  to  indicate 
current  expected  response  times  for  all  time-sharing  systems  on  the  network . 
Such  a  scheme  would  require  that  both  the  user  and  the  computing  hosts 
input  relevant  information  with  which  the  interface  processor  can  make  its 
predictions.  Generally,  users  are  able  to  indicate  the  expected  execution 
time  required  for  a  job  and  characterize  the  job  as  basically  i/O  bound 
or  CPU  bound.  Generally,  time-sharing  systems  provide  some  measure  of 
load,  such  as  number  of  users. 

One  of  the  major  purposes  of  this  research  is  to  develop  a 
process  for  the  network  interface  processor  (dynamic  response  time  monitor) 
that,  given  the  user  and  system  input  described,  can  predict  response 
time  within  some  confidence  interval.  A  combination  of  statistical, 
analytical  and  simulation  tools  will  be  used  to  produce  this  result. 
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The  Advanced  Research  Projects  Agency  (AREA)  network  has  been 
chosen  as  the  environment  for  the  research  project  and  five  different 
time-sharing  systems  accessible  from  the  network  will  be  investigated.* 
A  brief  description  of  the  ARPA  network  and  a  definition  of  the  system 
variables --the  computing  systems,  the  benchmark  jobs  each  representing 
a  particular  computing  application,  response  time  and  load  level--are 
given  below. 

2.1.   The  ARPA  Network  . 

The  ARPA  network  shown  in  Figure  2.1  is  generally  recognized 
as  the  pioneering  effort  in  computer  networking  and  resource  sharing 
research.  The  initial  objective  of  the  network  was  to  provide  a  system 
research  environment  in  which  the  technical  problems  of  networks  could 
be  explored  by  allowing  persons  and  programs  at  one  computing  center  to 
interactively  access  data  and  programs  at  other  computer  centers  attached 
to  the  network.  A  packet  switching  store-and-forward  network**  whose  nodes 
consist  of  interface  message  processor  computers  (iMPs)  was  set  up  and 
interconnected  by  50  kilobytes/ second  synchronous  communication  lines. 
The  host  computers  ranged  from  large-scale  general  purpose  systems  such 


*Both  experimental  design  and  practical  considerations  influenced  the 
decision  to  limit  this  investigation  to  just  five  systems.  The  systems 
themselves  were  diverse  enough  to  represent  a  wide  range  of  time-sharing 
scheduling  philosophies  and  research  funds  were  available  for  this 
specific  set  of  computing  nodes. 

**Definitions  for  such  technical  terms  as  "packet  switching"  and  store- 
and-forward  network"  are  given  in  Appendix  A  along  with  a  ready-reference 
list  of  frequently  used  abbreviations. 
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Figure  2.1.        AEPA  Network  Configuration   in  Early  197^ 
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as  PDP-lOs  and  IBM  36o/370s  to  specialized  computers  such  as  the  Illiac  IV 
and  the  Massachusetts  Institute  of  Technology  (MIT)  Multics  System.  The 
network  is  considered  to  "be  a  technological  and  resource  sharing  success 
as  is  evidenced  "by  its  operational  accomplishments  which  include: 

-  remote  use  of  computers  either  from  a  termination  on 
a  host  or  a  terminal  interface  processor  (TIP) 

-  file  movement  and  printing 

-  communication  of  personal  messages  by  way  of  "mailboxes" 

-  machine-to-machine  subroutine  communication 

-  access  to  large  common  data  bases. 

2  .2  .   System  Variables 

Five  interactive  operating  systems  currently  available  on  the 
AREA  network  were  chosen  for  comparison.  The  performance  of  these  operating 
systems  is  essentially  tied  to  the  computing  installation  supporting 
and  maintaining  them.  All  references  to  interactive  systems,  therefore, 
will  include  the  computing  site  as  well  as  the  name  of  the  system  itself. 
A  summary  of  the  five  systems  and  their  basic  scheduling  characteristics 
is  presented  in  Table  2.1.  A  more  detailed  discussion  of  each  of  the 
systems  follows.  Throughout  the  discussion,  reference  is  made  to.  the 
"working  set"  of  a  process,  in  describing  its  paging  behavior.  This 
concept  was  first  defined  by  Denning  and  is  explained  in  an  article  written 
by  him  [DEN68] . 

Four  of  the  five  interactive  systems  (all  but  AMES-TSS)  dispatch 
jobs  to  the  processor  using  a  scheduling  algorithm  whose  major  components 
are  a  series  of  priority  queues  and  associated  CPU  time-slices.  As  jobs 
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Table  2.1.   Computing  Systems  Summary 


Hardware 

Scheduler 

Location 

Configuration* 

Characteristics** 

AMES67--Nasa  Ames 

IBM  360/67 

Table  driven;  frequency 

Research  Center, 

1000K-2000K  core, 

and  duration  of  processor 

Moffett  Field,  CA 

5  disks,  3  drums, 

time-slices  determined  by 

(TSS) 

10  tape  drives,  3 
printers,  2  card 
readers,  6k 
terminals 

paging  behavior 

BBN--Bolt,  Beranek 

PDP-10 

Five  priority  queues 

and  Newman,  Inc . , 

193K  core,  9  disks, 

with  SXFS  processing 

Cambridge,  MA 

1  drum,  k   tape 

among  queues,  LXFS  pro- 

(tenex) 

drives,  1  printer, 

cessing  within  queues, 

6k   terminals,  1  dis- 

RR processing  in  the 

play  processor,  1 

last  queue 

plotter,  1  paper  tape 

punch,  1  paper  tape 

reader,  1  teletype 

scanner 

CON-Campus  Com- 

IBM 360/91 

Series  of  priority  queues, 

puting  Network, 

4000K  core,  5  disks, 

each  with  lower  dispatching 

Los  Angeles,  CA 

1  drum,  8  tape  drives, 

priority  and  effectively 

(TSO) 

k   printers,  85 

a  longer  time-slice  than 

terminals 

the  former;  each  queue 
served  FIFO 

MIT — Massachu- 

Honeywell 6k5 

Series  of  priority  queues, 

setts  Institute 

38i«  core,  11  disks, 

each  with  lower  dispatching 

of  Technology, 

1  drum,  10  tape 

priority  and  a  longer  fixed 

Cambridge,  Kk 

drives,  2  printers, 

time-slice  than  the  former; 

(MULTIC3) 

1  I/O  controller,  1 
card  reader,  1  card 
punch,  "several 
hundred"  terminals 

each  queue  served  FIFO 

US CD- -University 

Burroughs  6700 

Basically  two  priority  queues, 

of  California  at 

2I40K  core,  19  disks, 

with  high  priority  queue  of 

San  Diego,  CA 

8  tape  drives,  3 

burst-oriented  processes  and 

(CANDE) 

printers,  1  remote 

low  priority  queue  of  compute 

job  entry  terminal, 

bound  processes;  both  queues 

1  card  punch,  1  card 

served  FIFO 

reader,  512  terminals 

"Detailed  hardware  descriptions  are  available  in  [ANR73a] 
information  is  accurate  as  of  August,  1973. 

**FIF0  -  First  arrival,  first  service 
RR   -  Round  robin 

SXFS  -  Shortest  execution,  first  service 
IXFS  -  Longest  execution,  first  service 
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Table  2 .1  (continued) .   Computing  Systems  Summary 


Location 


Memory  Management 


Remarks 


AMES67--TSS 

Estimations  of  working  set 
and  working  set  size  char- 
acteristics  are  the  heart 
of  the  scheduler;  "balance 
core  time  principle"  used 
to  determine  time-slicing 

Distinguished  by  a  scheduler 
that  is  primarily  concerned 
with  core  demands  rather  than 
CPU  demands  . 

BBN--TENEX 

Balance  set  control  module 
in  scheduler  regulates 
running  processes  so  as  to 
minimize  the  probability  of 
and  idle  CPU  due  to  too 
frequent  page  faults 

Most  sophisticated  of  sche- 
dulers; embodies  all  three 
scheduling  disciplines  of  . 
SXFS,  LXFS,  and  RR 

CCN--TSO 

Fixed  (virtual)  region  size 
alloted  to  each  virtual 
machine;  single  process 
currently  on  a  virtual 
machine  has  access  to 
entire  region 

Distinguished  by  binding 
processes  to  one  of  a  fixed 
'  number  of  virtual  machines 
within  which  no  multi- 
programming occurs 

MIT— MULTICS 

A  list  of  "eligible" 
processes  is  maintained 
consisting  of  those 
processes  which  have  the 
highest  dispatching 
priority  and  can  simultan- 
eously exist  in  core 

Concept  of  set  of  "eligibles" 
insures  efficient  resource 
utilization  in  a  multi- 
programming environment 

UCSD— CANDE 

Multiprogramming  paged 
system  in  which  each  core 
resident  process  can  expand 
core  holdings  up  to  the 
maximum  size  of  its 
currently  assigned 
"sub space" 

Simplest  of  time -sharing 
scheduling  philosophies; 
like  TSS,  time-slices  are 
associated  with  a  process 
rather  than  a  queue 
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enter  the  system,  they  are  assigned  to  the  highest  priority  queue.   This 
queue  has  a  relatively  short  time-slice  associated  with  it.   If  the  job 
uses  its  entire  time-slice  in  its  first  pass  through  the  system,  it  is 
relegated  to  the  second  priority  queue  which  has  a  slightly  longer 
time-slice  associated  with  it  and  so  on.  Queues  are  served  from  highest 
priority  to  lowest  priority.  Discipline  within  queues  vary  among  FIFO, 
RR,  SXFS,  and  LXFS  as  explained  in  Table  2.1.  A  generalized  version  of 
these  scheduling  algorithms  is  presented  in  Figure  2.2.   This  representation 
will  be  made  specific  for  each  system  (except  AMES-TSS)  as  it  is  described 
in  detail. 


Figure  2.2.   Generalized  Time-sharing  Scheduling 
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2.2.1.   The  Computing  Systems 

2.2.1.1.   TSS  -  IBM  Time -Sharing  System  [DOHTO] 

The  TSS/360  interactive  system  has  a  table  driven  scheduler 
consisting  of  a  set  of  programs  in  the  resident  supervisor  used  for 
scheduling,  and  a  table  with  many  rows  (levels)  of  entries.  The  scheduling 
philosophy  is  based  on  the  premise  that  processes  making  light  demands  on 
the  CPU  and  core  resources  should  receive  fast  service  and  those  making 
heavier  demands  on  these  resources  should  receive  relatively  slower 
service .  The  implementation  of  this  philosophy  is  concentrated  almost 
entirely  in  the  constant  monitoring  of  a  process  '  paging  requirements 
(as  opposed  to  its  CPU  usage).  Programs  with  small  working  set  sizes 
are  awarded  frequent  and  comparatively  long  time-slices  in  the  processor. 
Processes  with  large  working  set  sizes  and  poor  locality  are  awarded  only 
short,  infrequent  time-slices.  This  strategy  tends  to  minimize  the  time 
that  any  large  program  can  clog  memory,  thereby  providing  a  potentially 
significant  increase  in  the  level  of  multiprogramming,  and  faster  response 
time  for  a  larger  number  of  processes. 

Assignment  of  core  resources  is  the  heart  of  the  TSS  scheduler. 
The  table  which  drives  the  scheduler  can  be  thought  of  as  being  divided 
into  sets  of  levels  grouped  primarily  according  to  the  core  usage  char- 
acteristics of  a  process.  The  interactive  sets  of  table  levels  are  the 
Starting  Set,  the  Looping  Set,  the  AWAIT  set,  the  Holding  Interlock  Set 
and  the  Waiting  for  Interlock  Set. 
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The  Starting  Set  of  table  levels  is  used  to  handle  new  inputs 
from  the  terminals.  This  set  consists  of  several  successive  high  priority 
table  levels,  each  with  small  execution  time  limits  and  increasingly  larger 
core  space  limits.  A  process  remains  under  control  of  the  Starting  Set 
of  table  levels  and  proceeds  through  its  various  queues  as  long  as  it 
continues  to  exceed  its  space  limits  only  (up  to  some  maximum).  When  the 
process  exceeds  its  time  limit  at  a  given  level,  the  space  limit  of  that 
level  is  used  as  the  estimate  of  the  current  working  set  size  of  that 
process  and  the  future  execution  of  the  process  is  controlled  by  the 
Looping  Set  of  table  levels. 

The  Looping  Set  table  levels  performs  three  significant  functions 
Its  first  function  deals  with  the  dynamic  estimation  of  the  time  and  space 
requirements  of  a  process  in  accordance  with  the  balanced  core  time 
principle.   This  principle  states  that  the  length  of  the  time-slice  to 
be  awarded  to  a  process  is  inversely  proportional  to  the  working  set 
size  in  that  time  interval.   The  second  function  of  these  table  levels 
is  to  cause  the  load  generated  by  long  running  processes  to  be  distributed 
so  as  to  allow  Starting  Set  entries  to  be  processed  quickly.  Finally, 
the  Looping  Set  optimizes  CPU  utilization  and  penalizes  bad  paging  processes 
by  causing  processes  with  minimal  paging  requirements  to  be  selected  for 
running  far  more  frequently  than  those  with  large  paging  requirements. 

Of  the  three  remaining  sets,  only  the  Holding  Interlock  Set 
of  table  levels  deals  with  processes  that  are  ready  to  run.  Processes 
running  from  this  set  are  currently  holding  interlocks  on  some  system 
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resource  and  have  a  high  priority  so  that  the  interlocked  resource  may  be 
quickly  freed.  The  AWAIT  Set  and  the  Waiting  for  Interlock  Set  administer 
processes  which  are  in  a  wait  state  for  some  reason. 

As  described  above,  processor  time- slices  are  allocated  dependent 
upon  a  process'  recent  core  usage  behavior.  The  frequency  and  duration 
of  the  time-slice  a  process  is  awarded  is  determined  by  values  in  the 
table  levels  of  the  Starting  Set,  Looping  Set  and  the  Holding  Interlock 
Set.  These  values  in  turn  are  determined  by  the  working  set  size  and 
locality  characteristics  .demonstrated  in  the.  process'  paging  demands. 

2.2.1.2.   TENEX  -  PDP-10  Time -Sharing  System  [B0BT2] 

The  TENEX  scheduling  philosophy  takes  a  middle  ground  between 
two  conflicting  precepts  of  process  behavior  in  a  time-sharing  environment. 
On  the. one  hand,  the  more  time  a  process  has  used,  the  closer  it  is 
to  completion.  On  the  other  hand,  the  longer  a  process  has  run,  the  less 
are  the  chances  that  it  will  complete  "soon".  Ready  jobs  are  distributed 
in  queues  for  service,  therefore,  such  that  if  two  processes  are  widely  • 
separated  in  accummulated  run  time  (are  in  different  queues)  the  one 
with  the  lesser  time  will  be  preferred,  and  if  two  processes  are  closely 
spaced  (are  in  the  same  queue),  the  one  with  the  greater  time  will  be 
preferred.  This  type  of  scheduling  can  be  characterized  as  shortest- 
processing-time  first  among  queues  and  longest-processing- time  first 
within  queues. 
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A  second  aspect  of  the  TENEX  scheduling  philosophy  is  concerned 
with  the  complex  interplay  in  the  allocation  of  core  and  CPU  resources. 
Incorrect  handling  of  the  information  gathering  and  decision  making 
procedures  involved  in  determining  working  sets  and  core  utilization  in 
a  multi-process  paged  system  can  result  in  poor  efficiency  and  bad  service. 
Thus,  a  "balance  set  control"  module  directly  responsible  for  these 
functions  is  made  an  integral  part  of  the  scheduler. 

Figure  2.3  depicts  the  four  distinct  scheduler  modules.   The 
process  controller  and  balance  set  control  modules  will  be  discussed  in 
detail  below.   The  real-time  scheduler  is  concerned  only  with  those 
processes  which  are  currently  making  real-time  demands  on  the  system.   Its 
scheduler  portion  is  invoked  whenever  an  external  signal  or  clock  indicates 
that  rescheduling  may  be  required.   If  there  are  no  real-time  processes 
requiring  service,  then  the  selection  of  a  process  to  run  falls  to  one 
of  the  other  modules.   The  function  of  the  startup  and  dismiss  routines 
is  fairly  common  and  straightforward.   Included  in  this  module  are  routines 
to  save  and  restore  environments  as  they  go  out  of  and  into  execution. 
No  important  scheduling  or  other  decisions  are  made  by  this  module. 

The  balance  set  control  module  of  the  TENEX  scheduler  is 
responsible  for  efficient  use  of  core.   The  logical  storage  organization 
includes  the  core,  drum  and  disks  and  their  associated  channels  so  that 
the  efficient  use  of  core  is  closely  related  to  making  efficient  use  of 
the  data  channels  to  the  drum  and  disk.  Because  of  this  logical  memory 
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Figure  2-3.        The  TENEX  Scheduler 
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structure,   when  a  process  cannot  "be  run  because  of  a  page  fault,   the 
process  is  not  considered  to  he  in  a  wait  state.     The  process  is,    in  fact, 
still  demanding  CPU  services  which  cannot  he  given  because  core  rather 
than  the  CPU  is  not  available. 

Three  basic  functions  fall  under  the  jurisdiction  of  the  balance 
set  control  module.     These  include  maintaining  the  list  of  processes 
in  the  balance   set  such  that  the  working  set  of  all  these  processes  can 
co-exist  in  core,    selecting  a  process  in  the  balance  set  for  running 
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when  the  running  process  must  be  stopped  for  a  page  fault,  and,  on 

the  occurrence  of  rescheduling  event,  removing  and/or  adding  processes 

to  the  balance  set  in  cooperation  with  the  process  controller. 

Dynamically,  determining  how  many  processes  can  simultaneously 

reside  in  core  and  what  the  size  of  these  processes  should  be  is  the 

central  function  of  the  balance  set  control.   This  involves  trying  to 

keep  a  balance  set  which  maximizes  the  probability  that  there  will  always 

be  at  least  one  process  to  run.  That  is,  whenever  one  process  experiences 

a  page  fault,  there  should  be  another  process  ready  to  utilize  the  CPU 

resource.   This  suggests  that  the  processes  must  run  an  average  time, 

T  ,  greater  than  the  average  interval  over  which  one  page  transfer 

will  be  completed  for  one  of  the  page-waiting  processes,  W   .   The 

balance  set  control  module  iteratively  estimates  T   and  W   and  attempts 

J  av      av 

to  maintain  an  environment  in  which  T   >  W 

av    av 

If  the  balance  set  control  function  described  above  provides 
more  than  one  process  which  is  an  eligible  member  of  the  balance  set, 
then  an  algorithm  is  required  for  selecting  one  among  these  processes  to 
run  when  a  page  fault  occurs.  This  algorithm  is  also  a  part  of  the 
balance  set  control  module.  Finally,  several  rescheduling  events  can  occur 
which  require  the  removal  or  addition  or  processes  to  the  balance  set. 
These  events  include  processor  time  quantum  overflow,  I/O  blocks,  or  i/O 
unblocks.  Handling  these  process  exchanges  in  and  out  of  the  balance 
set  is  a  balance  set  control  module  task. 
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Processor  resources  in  TENEX  are  allocated  to  processes  chosen 
from  distinct  ready  queues,  where  queue  position  is  determined  "by 
previously  accumulated  processor  time.  Figure  2.k   is  a  graphic  presentation 
of  this  scheduling  algorithm. 

Figure  2.k.       BBN-TENEX  Scheduling 
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The  scheduler  prefers  a  process  in  a  smaller  numbered  queue 
over  that  in  a  higher  numbered  queue.   In  this  respect,  it  prefers 
processes  with  the  smallest  amount  of  accumulated  time.  But  further, 
within  a  queue,  the  scheduler  chooses  for  execution  the  process  with  the 
longest  accumulated  time  in  the  expectation  of  completing  a  process  which 
probably  requires  only  a  small  additional  amount  of  CPU  time. 
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These  queues  are  not  extended  indefinitely,  but  terminated 
with  N  =  5  distinct  queues,  for  two  separate  reasons.  First,  a  process 
that  had  run  a  very  long  time  would  get  no  further  service  if  another 
process  began  a  long  computer  run  until  the  second  process  had  run  nearly 
as  long  as  the  first.   (A  long  running  process  could  also  be  completely 
shut  out  of  service  by  a  set  of  short  running  processes  which  used  100 
percent  of  the  CPU.)   Second,  although  the  frequency  of  rescheduling 
goes  down  as  the  queue  time  becomes  large,  a  point  is  reached  at  which 
the  rescheduling  overhead  is  an  insignificant  fraction  of  the  total  time 
and  no  gain  is  achieved  by  reducing  it  further.  For  these  reasons,  then 
a  "last  queue"  is  defined.   Processes  in  this  queue  are  scheduled  using 
a  round-robin  discipline,  disregarding  all  former  processing  history 
at  this  point  and  cyclically  giving  each  process  a  certain  quantum  of 
processing  time  in  turn. 

Use  of  this  scheduling  algorithm  requires  the  assignment 
of  three  parameters: 

-  the  factor  by  which  the  processing  time  allotted  on 
each  queue  is  greater  than  the  last 

-  the  amount  of  processor  time  allotted  on  the  first  queue 

-  the  number  of  queues. 

The  basic  principle  involved  in  assigning  these  parameters  is  that 
fewer  and  longer  queues  result  in  less  system  overhead  but  produce  a 
poorer  approximation  to  ideal  scheduling  as  represented  by  a  large 
number  of  queues.  Bolt,  Beranek  and  Newman  (BBN)  have  currently  assigned 
values  to  these  three  parameters  as  follows: 
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-  the  i       queue  receives  four  times  the  processing  allotment 

st 
as  the    (i-l)        queue 

-  queue  one  allots  6k   msec  for  processing 

-  there  are  five  distinct  queues. 

Up  to  this  point,  the  discussion  of  the  scheduler  has  "been 
limited  to  handling  jobs  on  the  ready  queue.   The  scheduling  algorithm 
also  keeps  account  of  processes  waiting  for  some  external  condition  or 
event  such  as  an  i/o  device  to  complete  or  a  user  to  type  a  character. 
In  this  case,  the  scheduler's  goal  is  to  insure  that  these  processes 
too  will  receive  their  fair  share  of  processing  time,  i.e.,  about  l/M 
of  the  CPU,  where  M  is  the  number  of  processes  in  the  system.   The 
scheduler  achieves  this  goal  by  using  the  following  procedure.  During 
the  periods  in  which  a  process  is  in  the  wait  state,  the  process  is 
"credited"  for  CPU  time  not  used  by  reducing  the  accumulated  time 
values  at  the  rate  of  l/M.  Reducing  this  quantity  tends  to  move  the 
process  to  the  higher  queues  so  that  it  will  be  preferred  over  other 
processes  which  continue  to  run.  This  procedure  does  not  include  waits- 
occasioned  by  disc  or  drum  transfers  as  explained  in  the  previous  section 
describing  the  core  allocation  algorithm. 

2.2.1.3.   TSO  -  IBM  Time -Sharing  Option 

The  basic  scheduling  philosophy  of  the  TSO  time-sharing  system 
is  to  award  fast  response  times  to  processes  requiring  only  a  short 
amount  of  CPU  service.  Processes  requiring  increasingly  longer  amounts 
of  processing  time  experience  proportionately  longer  response  times. 
This  philosophy  is  implemented  in  a  series  of  queues  (usually  three  or 
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four)  through  which  a  process  descends  during  its  residence  in  the  system. 
Each  queue  has  a  lower  dispatching  priority  than  the  former  one,  and  each 
queue  typically  allots  a  longer  processing  time-slice  to  its  members. 
Processes  are  served  strictly  first-come,  first-served  within  queues. 
The  TSO  time -sharing  system  can  be  run  in  a  real  or  virtual 
memory  system,  for  example  OS/MVT*  or  0S/VS2*,  respectively.   The  basic 
concept  behind  core  assignment  is  the  same  in  both  types  of  systems,  but 
the  implementation  of  the  assignment  is,  of  course,  different.  A  pre- 
determined number  of  regions,  say  four,  is  set  up  in  memory  and  these 
regions  form  separate  virtual  processing  systems  which  are  assigned  to 
users  as  they  log  onto  TSO.  Users  are  associated  with  one  of  these 
regions  exclusively  for  the  duration  of  their  working  session.  Each 
of  the  virtual  systems  acts  independently  of  the  others  and  each  has 
an  independent,  optionally  identical  scheduling  algorithm  as  described 
below.  Within  a  region  (,or  virtual  processing  system)  no  multiprogramming 
exists.  Each  process  has  use  of  the  entire  core  and  CPU  resources 
assigned  to  its  region  until  it  is  swapped  out  in  total  and  put  back 
on  one  of  the  dispatching  queues.  The  UCLA  Campus  Computing  Network 
(CCN)  TSO  system  is  an  OS/MVT  system  with  one  memory  region. 


*See  Appendix  A  for  definitions. 
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The  TSO  scheduler  chooses  processes  for  running  dependent 
only  on  the  most  recent  behavior  of  the  process.  That  is,  only  the  last 
cause  for  removal  from  execution  (i/O  request,  timer  run-out,  etc.)  is 
used  to  determine  the  next  queue  position  for  that  process.   The  dispatching 
algorithm,  illustrated  in  Figure  2.5,  typically  defines  three  queues, 
Ql,  Q2,  and  Q3,  to  which  a  ready  process  may  "be  assigned.  The  first 
queue  consists  of  processes  which  have  just  passed  from  a  blocked  (or 
wait)  state  to  a  ready  state.  These  processes  have  the  highest  dispatching 
priority  and  are  served  first-come,  first-served  within  Ql.   The  second 
and  third  queue  consist  of  processes  which  experienced  a  timer  run-out 
during  their  last  time-slice  in  Ql  or  Q2,  respectively. 

In  general,  an  extensive  set  of  parameters  exist  with  which 
to  manipulate  the  function  of  dispatching  processes  for  CPU  service. 
CCN-TSO  in  effect  controls  its  queues  by  setting  three  of  these  optional 
parameters  to  significant  values.  The  "preempt"  option  is  enabled,  and 
parameters  called  "min-slice"  and  "occupancy  time"  are  set  for  each  queue. 
The  occupancy  time  associated  with  a  queue  is  the  maximum  time-slice 
of  execution  allowable  to  a  process  from  that  queue.  These  values  are 
presently  set  a  2.0  seconds  for  Ql,  k.O   seconds  for  Q2,  and  16.0  seconds 
for  Q3«  The  min-slice  settings  work  in  conjunction  with  the  preempt 
option  and  presently  are  assigned  values  of  1.6  seconds  for  Ql,  2.0 
seconds  for  Q2,  and  3-0  seconds  for  Q3«  These  matter  values  override 
the  occupancy  time  settings  in  the  following  way.   If  a  process  is 
queued  for  service  at  the  same  or  higher  priority  level  than  a  process 
presently  holding  the  CPU,  then  the  process  holding  the  CPU  is  preempted 
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after  its  respective  min- slice,  rather  than  "being  allowed  to  utilize 
its  entire  occupancy  time  quantum  of  service.   Preempted  processes 
return  to  the  queue  from  which  they  had  just  come,  until  they  have  been 
allocated  processor  time  equal  to  the  occupancy  time  for  that  queue. 

Figure  2.5.   CCN-TSO  Scheduling 
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2.2. l.k.        MULTICS  -  MIT  Time-Sharing  System  [0RG72] 

The  MULTICS  time -sharing  scheduler  design  was  based  on  the 
philosophy  that  the  higher  the  load  a  process  places  on  the  system  when 
it  is  allowed  to  run,  the  lower  its  scheduling  priority  should  be.  Thus, 
processes  requiring  the  smallest  amount  of  processor  time  share  the 
highest  priority  queue.   Principally  because  of  memory  limitations, 
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however,  not  all  equal-priority  processes  can  share  the  processor 
simultaneously.  The  basic  time-sharing  scheduling  philosophy,  then,  is 
modified  by  a  multiprogramming  scheduling  function.  This  multiprogramming 
function  restricts  access  to  the  processor  to  an  appropriate  subset  of 
equal-priority  processes  called  the  "eligibles" .  This  subset  is  chosen 
small  enough  so  that  work  that  is  done  for  each  member  is  not  degraded, 
for  instance,  by  thrashing. 

An  active  process  in  the  MULTICS  system  cycles  through  five 
execution  states- -running,  ready,  waiting,  blocked  and  stopped.  The 
execution  state  not  only  describes  a  process  '  processor  contention 
characteristics,  but  also  suggests  how  that  process  is  competing  for 
me  mor y  res  our c  e  s . 

Only  running  and  waiting  processes  are  considered  eligible 
to  directly  compete  for  pages  of  core  memory  at  any  one  time.  Eligibility 
refers  to  the  depth  or  degree  of  multiprogramming  and  is  first  conferred 
on  a  ready  process  when  that  process  attains  highest  relative  priority 
among  noneligible  ready  processes  and  when  its  core  requirements,  when 
added  to  those  of  the  eligible  processes,  do  not  exceed  the  total  available 
core.  Eligibility  is  withdrawn  when  a  process  uses  up  its  time-slice 
allotment,  completes  an  interaction  or  otherwise  enters  a  dormant  (blocked 
or  stopped)  state. 

A  running  process  may  attempt  to  capture  as  much  core  as  it 
needs.   It  will  be  restricted  in  its  attempts  only  by  the  competing 
demands  of  processes  that  are  simultaneously  executing  on  the  processor. 
A  waiting  process  (differentiated  from  a  blocked  process  by  the  predictably 
short  period  of  time  it  has  to  wait  for  a  system  event,  for  example, 
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the  arrival  of  a  page  into  core)  remains  eligible  to  compete  for  core 
and  retains  its  favorable  queue  position.   In  general,  because  a  waiting 
process  is  not  actually  executing,  attrition  can  occur  in  its  core  holdings 
due  to  demands  made  by  executing  processes.   Since  wait  periods  are 
expected  to  be  relatively  short,  however,  there  are  only  short  periods 
between  the  wait  and  running  states  of  a  process  and,  therefore,  minimal, 
if  any,  attrition  of  the  waiting  processes'  core  holdings  occurs. 

The  ready,  blocked  and  stopped  processes  share  the  same  core 
competition  status  in  that  they  are  all  "losers".  Because  these  processes 
are  not  eligible,  they  cannot  acquire  core  pages.  The  executing  processes 
fulfill  their  core  requirements  at  the  expense  of  these  noneligible 
processes  and  thus  these  latter  continue  to  lose  what  pages  they  previously 
had  resident  in  core.  The  longer  a  process  is  not  eligible,  the  fewer 
pages  it  can  expect  to  have  in  core. 

As  stated  earlier,  a  process  receives  a  dispatching  or  scheduling 
priority  based  on  the  load  it  will  place  on  the  system.   Since  in  general 
a  command's  duration  is  not  known  in  advance,  an  adaptive  technique  is 
used  to  dynamically  estimate  the  processor  requirements  of  each  process. 
In  the  MULTICS  scheduler,  the  assumption  is  made  that  every  process 
arriving  on  the  ready  list  for  the  first  time  will  execute  a  short  command 
and,  therefore,  deserves  a  high  priority  position  on  the  ready  list. 
Associated  with  the  position  is  some  fixed  time  allotment  t  .  When  a 
process  is  picked  to  compete  directly  for  processor  and  core  resources, 
i.e.,  is  eligible,  the  command  may  run  to  completion.   If  the  allotted 
time  is  exhausted,  a  timer  run-out  mechanism  will  halt  execution  of  the 
process  and  it  will  then  be  assigned  to  a  lower  priority  position.  Each 
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lower  priority  position  awards  the  process  an  increased  allotment  of 
time  up  to  some  maximum  until  it  completes  execution.  The  processing 
time  allotment  associated  with  the  r   priority  position  is  approximately 

Figure  2.6  illustrates  a  convenient  way  to  conceptualize  the 
MULTICS  dispatching  of  processes.  Even  though,  in  fact,  only  one 
ready  list  exists  in  the  MULTICS  scheduling  scheme,  this  single  list 
effectively  functions  as  a  set  of  n  priority  queues.  The  processing  time 
allotment  in  queue  1  is  one  second  and  approximately  doubles  in  each  queue 
up  to  queue  k.     Processes  are  served  FIFO  at  each  priority  level.  Exact 
implementation  of  this  straightforward  algorithm  "becomes  fairly  complex 
in  the  MULTICS  system  and  the  reader  is  referred  to  other  authors 
[GRE7^,  0RG72]  for  a  more  detailed  discussion. 

In  keeping  with  the  policy  of  giving  good  response  to  interactive 
users  that  issue  commands  of  short  duration,  preemption  is  permissible  in 
the  MULTICS  system.  A  higher  priority  process  can  preempt  a  presently 
eligible  process  of  lower  priority.  The  preempted  process  is  favorably 
treated,  relatively  speaking,  in  that  it  is  placed  at  the  top  of  its 
priority  queue  with  a  time  allotment  equal  to  whatever  time  is  unused 
from  its  last  scheduling  allotment . 

2.2.1.5.   CAKDE  -  University  of  California  at  San  Diego  (UCSD) 
Time -Sharing  System 

The  CANDE  interactive  computing  system  espouses  a  straightforward 

approach  which  operates  basically  by  distinguishing  burst-oriented 

processes  from  those  that  are  compute  bound.  Processes  which  are 

estimated  to  require  a  "small"  amount  of  CPU  time  as  determined  by  the 
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Figure  2.6.   MIT-MULTICS  Scheduling* 
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xThis  illustration  is  an  approximate  description  of  the  MULTICS  scheduling 
function.   In  fact,  only  one  dispatching  queue  is  maintained,  and  the 
system  has  two  processors. 


fact  that  they  did  not  exceed  their  allotted  time-slice  during  their 
most  recent  execution  state,  are  served  first-come,  first-served  from  a 
high  priority  queue.   Processes  which  incurred  a  timer  run-out  during 
their  last  run  period  are  served  first-come,  first-served  from  a  low 
priority  queue . 

CANDE  is  a  virtual  memory  system  which  multiprograms  processes 
into  "subspaces"  of  real  core.   If  a  process  has  the  highest  dispatching 
priority  and  there  is  adequate  memory  available  for  a  swap-in,  then 
the  process  receives  its  required  core  storage.  Memory  assigned  to  a 
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process  is  increased  up  to  the  fixed  size  of  the  subspace  whenever  the 
process  exceeds  its  currently  allotted  space.  There  are  five  events 
which  can  cause  a  process  to  be  swapped  out.  These  include  an  input  wait, 
an  output  wait,  a  process  suspension,  a  time-slice  allotment  expiration  and 
a  core  demand  in  excess  of  the  subspace  size  allocated  to  the  process 
during  its  previous  swap  into  core. 

The  primary  goal  of  the  subspace  option  is  to  allow  a  large 
number  of  burst-oriented  processes  to  run  without  freezing  memory 
resources  during  their  dormant  periods.  Memory  is  freed  by  immediately 
swapping  the  process  to  disk  when  it  becomes  dormant.  Because  a  large 
number  of  tasks  are  bidding  for  a  limited  memory  resource,  tasks  which 
discontinue  their  burst-orientation  (become  compute  bound)  have  an 
artificial  burst  rate  imposed  upon  them.  This  artificial  burst  rate 
is  called  the  process'  time-slice. 

CAKDE  has  two  priority  levels  (queues)  for  selecting  ready 
tasks  for  execution,  or  swapping  into  core  as  illustrated  in  Figure  2  .7. 
The  lower  priority  queue  contains  processes  which  exceeded  their  time 
slice  during  their  last  swap-in.  The  higher  priority  queue  contains 
all  other  ready  processes.  These  high  priority  processes  are  those  which 
are  new  to  the  system,  which  have  received  input  for  which  they  were 
waiting,  which  have  output  at  least  half  of  the  data  excess  which  originally 
caused  them  to  be  swapped  out  or  which  have  been  awakened  from  swap-out 
suspension.  Within  this  high  priority  or  "demand  status"  queue  processes 
are  ordered  first-in,  first-out  as  they  are  within  the  lower  priority 
queue.  Lower  priority  queue  processes,  or  "time-sliced"  processes,  are 
swapped  into  available  memory  only  if  there  are  no  demand  status  swap 
requests  which  can  be  satisfied. 
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Figure  2.7.   UCSD-CANDE  Scheduling 
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*n  and  c  are  defined  on  page  3^  •  Jobs  feed  into  the  first  priority  queue 
if  their  last  removal  from  execution  was  caused  "by  a  wait  or  blocked 
state  and  they  feed  into  the  second  priority  queue  if  their  last  cause 
of  removal  from  execution  was  a  timer  run-out. 


The  time-slice  allocated  to  each  process  when  it  is  swapped 
into  core  is  computed  on  an  individual  basis  and  does  not  depend  exclusively 
on  priority  level.  Before  allocating  a  processor  to  a  swappable  process, 
both  its  allowable  processor  time-slice  and  its  allowable  elapsed  time- 
slice  are  checked.   If  either  has  been  exceeded,  a  new  slice  is  computed 
as  defined  by  the  formulas  given  below. 
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The  formulas  for  computing  a  time-slice  are: 

Processor  Time-Slice:  T=  (n*kl  +  c+p  +  8)  *  k2  +  m  *  I+16667 

Elapsed  Time-Slice:  E  =  T  *  r 

where 

n    is  the  slice  number.  When  a  process  is  swapped  out  due  to  a 
demand  condition,  its  slice  number  is  set  to  zero.  Each  time 
a  process  is  swapped  because  of  exceeding  its  (processor  or 
elapsed)  time-slice,  its  slice  number  is  incremented  by  one. 
This  number  is  subject  to  a  maximum  value  of  7* 

c  is  the  core  space  used  by  the  process  in  chunks,   (l  chunk  =±  990  words) 

m  is  the  minimum  time-slice  in  seconds,   (m  =  1) 

kl  is  h. 

k2  is  5000 

p  is  priority  (p  =  51) 

r'    is  the  ratio  of  elapsed  time  to  processor  time. 
Time-slice  units  are  2 .k   msec. 

2.2.2.   Benchmark  Jobs 

Three  benchmark  jobs  were  distributed  on  each  of  the  computing 

systems  studied,  with  some  exceptions.   The  first  benchmark  job  was 

dominated  by  arithmetic  operations,  the  second  consisted  of  manipulations 

of  bit  strings  and  the  third  was  input/output  bound.  Listings  of  these 

benchmark  jobs  as  they  were  stored  and  used  on  each  computing  system  are 

presented  in  Appendix  B.*  These  jobs  were  chosen  for  their  distinct 


*These  benchmark  jobs  were  generated  by  members  of  a  research  group 
working  under  the  direction  of  Dr.  P.  A.  Alsberg,  Center  for  Advanced 
Computation,  University  of  Illinois,  Urbana-Champaign .  They  were 
used  in  this  research  with  Dr.  Alsberg' s  permission. 
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claims  on  the  system  resources  of  CPU  processing,  core  use  and  i/o 
channel  utilization.   These  particular  listings  were  generated  at  MIT- 
MULTICS.  Job  listings  from  all  other  installations  are  essentially 
identical. 

The  "number  cruncher  "  or  arithmetic  benchmark  job  was  written 
in  standard  FORTRAN  and  generates  a  100  x  100  correlation  matrix  for  a 
100  x  100  input  array  called  DATA.  A  main  program  dimensions  all 
arrays  and  appropriately  initializes  arrays  and  variables.  This  main 
program  then  calls  on  a  subroutine  to  generate  the  required  correlation 
matrix.   This  benchmark  places  demands  on  the  system  resources  of  core 
(more  than  20  kilobytes  of  core  are  required  just  for  array  storage)  and 
on  CPU  processing  (the  innermost  loop  in  the  subroutine  is  executed 
.5*10  times) . 

The  bit  string  manipulating  benchmark  job  was  designed  to  place 
its  main  system  resource  demand  on  the  CPU  alone.   This  standard  PL/ I 
program  takes  a  100  x  100  input  matrix  called  REALITY  whose  entries  are 
ones  that  can  be  traversed  from  the  top  row  to  the  bottom  row,  traveling 
only  vertically  and  horizontally  between  adjacent  squares.  A  second 
matrix  (FOUND)  of  the  same  dimensions  as  REALITY  is  used  as  an  internal 
work  space.   Initially  all  entries  in  FOUND  are  zeroes.  When  a  valid 
path  is  discovered  from  the  first  row  of  reality  to  an  adjacent  square, 
the  corresponding  neighboring  element  in  FOUND  becomes  a  one.  Thus,  the 
elements  in  FOUND  that  are  ones  represent  elements  in  REALITY  which  can 
be  reached  from  the  first  row.  At  each  iteration,  an  element  in  FOUND 

4 

becomes  a  one  if  the  corresponding  element  of  REALITY  is  a  one  (i.e.,  it 
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is  connected  to  a  valid  path  from  the  top  row) .  The  process  terminates 
either  when  no  new  ones  appear  in  FOUND  or  when  an  element  in  the  "bottom 
row  of  FOUND  becomes  a  one.  This  program  is  a  bit  manipulating  benchmark 
since  matrices  are  stored  and  referenced  as  hit  strings. 

The  third  and  simplest  of  the  benchmark  jobs  was  also  written 
in  standard  PL/ 1 •  It  was  designed  to  make  its  main  resource  demand  on 
the  i/O  mechanism  of  the  computing  system.  The  program  opens  a  file  and 
writes  1,000  250-word  records  into  it.   It  proceeds  to  close  the  file, 
reopen  it,  read  the  same  1,000  records  back  and  finally  closes  the  file 
once  again. 

Table  2 .2  indicates  exactly  which  benchmarks  were  run  at  each 
of  the  computing  centers  and  explains  why  certain  of  the  benchmarks 
were  omitted. 


Table  2.2.   Benchmark  Jobs  Run  at  Various  Computing  Centers 


System 

Number  Crunching 
Benchmark 

Bit  Manipulating 
Benchmark 

I/O  Bound 
Benchmark 

AMES-TSS 

Yes 

Yes 

Yes 

BBN-TENEX 

Yes 

PL/l  is  not  available  on  this  system 

CCN-TS0 

Yes 

Yes 

Yes 

MIT-MULTICS 

Yes 

Yes 

Yes 

UCSD-CANDE 

Yes 

PL/l  is  not  available  on  this  system 
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2.2.3-   Load  Level 

Each  of  the  computing  systems  under  study  was  arbitrarily  said 

to  have  ten  distinct  load  levels  within  which  it  operated.   In  general, 

the  load  levels  are  uniformly  distributed  intervals  in  which  the  value 

(e.g.,  number  of  users,  load  average  or  utilization  fraction)  of  the 

two  end  points  and  the  interval  width  depend  on  the  load  measure  for  a 

particular  system  and  its  observable  load  range,  respectively.   The  ' 

k   load  level  for  the  i   system,  I.    ,  ,  is  defined  by  an  interval 

i,k' 


*i,k  =  [((si/10)  *  (k"1))  +  !>  (si/10)  *  ^ 


where  s.  is  a  measure  of  load  in  a  saturated  system  i. 

For  example,  UCSD  measures  load  in  number  of  users  and  its 
highest  observable  load  level  was  taken  to  be  30  users.  The  fifth  load 
level,  therefore,  would  be  defined  as 


\rCSD,5  =  [  ((30/10)  *  k)  +   1,  (30/10)  *  5J 


or 


Ws  "  [13'151 


Several  exceptions  to  this  load  level  definition  arise  owing  to 
the  individual  characteristics  of  the  systems  being  studied.  AMES67 
measures  increasing  load  in  terms  of  a  decreasing  function  in  direct 
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contrast  to  all  the  other  systems  under  consideration.  The  AMES67 
measure  is  a  utilization  fraction  ranging  from  1.0  for  no  load  to  0.0 
for  extremely  heavy  loads.   In  this  case,  the  load  interval  is  defined 
as  follows : 


'jBBSST.k  "  [-1  *  (1°"k)'  U  *  (U-k))  "  -001]- 


Other  exceptions  to  the  load  level  definitions  occur  in  the 
■widths  of  the  most  heavily  loaded  levels  (that  is,  load  levels  8,  9  and 
10) .   Since  response  times  on  some  of  the  systems  studied  grows  very 
large  with  increasing  loads  (response  times  rise  to  approximately  one 
hour  on  the  BBN-TENEX  system  under  heavy  loads),  it  "becomes  difficult 
to  take  a  response  time  measurement  within  a  load  level  that  is  too 
narrowly  defined.  The  load  varies  more  during  these  longer  periods  than 
during  the  lightly  loaded,  short  response  time  periods.  For  this 
reason,  the  widths  of  load  levels  were  sometimes  broadened  at  the  high 
end  of  the  load  level  spectrum  (see  levels  6-10  of  BBN-TENEX  in  Table  2.3) 

Still  another  adjustment  was  made  in  the  load  level  definition 
for  the  BBN  system  running  TENEX.   The  TENEX  load  measure  is  one  of 
"load  average"  defined  as  the  ratio  of  number  of  runnable  jobs  (jobs  not 
blocked  for  i/o  or  otherwise  in  a  wait  state)  to  running  jobs  (jobs  which 
are  loaded  in  core  and  immediate  potential  candidates  for  CPU  time-slices) 
The  rapidly  changing  nature  of  this  measure,  combined  with  the  relatively 
long  response  times  for  the  TENEX  system,  even  under  moderate  loads, 
necessitated  overlapping  load  level  definitions  to  obtain  any  valid 
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Table  2.3-   Load  Level  Definitions 


SYSTEM 

AMES67-T3S 

BBN-TENEX 

CCN-TSO 

MIT-MULTIC  S 

UCSD-CANDE 

1 

LOAD 
MEASURE 

Utilization 
Fraction 

Load 
Average 

Num.be  r  of 
Users 

Number   of 
Users 

Number  of 
Users 

LOAD 
LEVEL 

■I 

1 

1 

i 

(-900, .999) 

(   0,2      ) 

(   1,1     ) 

(  1,7  )               (  1,3     ) 

2 

(.800, .899) 

(    1,3      ) 

(    2,2      ) 

(  8,  lU)               (  4,6     ) 

3 

(.700,-799) 

(   2,k     ) 

(  3,3     ) 

| 
(15,21)               (  7,9     ) 

h 

(.600,  .699) 

(   3,5     ) 

(    h,k      )                (22,28)                  (10,12    ) 

i 

5 

(.500,-599) 

(  k,8     ) 

(  5,5     )             (29,35) 

(13,15   ) 

6 

(.koo,.k99) 

(  6,io  ) 

(  6,6     )             (36,42) 

(16, 18  ) 

7 

(-300,-399) 

(   8,12   ) 

(7,7     )             (43,  49)               (19,21  ) 

8 

(.200, .299)    '       (10, lk   ) 

(  8,8     )             (50,56)               (22,26  ) 

9 

(.ioo,.i99) 

(12, 16    ) 

(9,9     )   i          (57,63)               (26,30  ) 

10 

(.ooo, .099) 

(i4,n4) 

(10,>10)  i          (64,70)               (31,>31) 

1+0 


response  time  measurements.  For  example,  iL__T  ,-  =  [ 10.0,1^.0]  and 
^■d-dtvt  £  -   [I2.0,l6.0],  where  the  end  points  of  the  intervals  are  load 

BB1N  ,  O 
averages.  Table  2.3  contains  a  complete  listing  of  the  load  level 

definitions  for  the  five  systems  under  study. 

The  system  load  was  recorded  before  and  after  each  response 

time  measurement.  A  measurement  was  said  to  be  taken  at  one  of  the  ten 

possible  load  points  only  if  both  load  recordings  fell  within  the 

interval  defined  by  that  respective  load  level. 

2.2.4.   Response  Time 

The  main  performance  measure  to  the  user  of  an  interactive 
system  is  response  time.  Users  are  happy  if  the  system  reacts  within  a 
time  span  they  have  learned  to  expect.   If  the  system  does  not  perform 
as  expected,  user  discontent  rises.  Frustration  increases  rapidly 
when  expectations  of  immediate  response  are  thwarted.  However,  frustration 
increases  much  more  slowly  when  the  expected  turnaround  time  is  such 
that  the  user  turns  attention  away  from  the  response  time  to  other 
activities.  This  latter  expected  response  time  may  range  from  approxi- 
mately ten  minutes  to  several  hours. 

The  response  time  to  a  "run"  command,  given  that  the  required 
CPU  time  for  the  program  to  be  run  is  less  than  one  minute  or  so,  hovers 
between  two  response  classes.   On  the  one  hand,  if  the  system  is  lightly 
loaded,  program  execution  may  be  completed  in  a  few  minutes.   In  this 
case,  the  users  would  probably  devote  their  attention  solely  to  waiting 
for  the  system  response.   On  the  other  hand,  if  the  system  is  heavily 
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loaded,  full  program  execution  may  require  as  much  as  an  hour,  or  even 
more,  and  users  would  turn  their  attention  to  some  other  activity  while 
they  were  waiting. 

In  order  to  measure  and  compare  response  times  to  run  commands 
on  heterogeneous  computing  systems,  a  definition  of  response  time  is 
required  that  will  be  consistent  across  all  systems,  exhibit  a  meaningful 
association  to  time-sharing  system  performance  and  also  correspond  to 
the  users'  conception  of  how  long  they  have  waited.   J.  F.  Maranzano  [MAR73J 
has  proposed  such  a  definition.  Maranzano ' s  definition  of  interactive 
response  time  identifies  the  interval  "from  the  end  of  user  typing  of  a 
command  (often  called  the  carriage  return)  to  the  first  character  of 
output  on  the  terminal"  as  the  critical  time  span.  This  response  time 
definition  meets  the  criteria  described  above  in  that  it  is  measurable 
on  all  systems,  the  distribution  of  its  values  under  varying  circumstances 
is  a  description  of  system  performance  and  users  stop  their  waiting 
activity  at  the  first  physical  sign  of  output  on  the  terminal. 

This  definition  will  be  slightly  modified  in  this  study  to  its 

following  form: 

DEFINITION:   Interactive  response  time  is  the  number  of 
seconds  which  elapse  from  the  end  of  user 
typing  of  a  command  (carriage  return)  to  the 
first  character  output  on  the  terminal  indicating 
the  completion  of  execution  of  the  command. 

The  first  output  character  is  required  to  be  that  which  signals  the 

completion  of  command  execution  because  some  commands  print  informative 

messages  at  the  beginning  of  their  execution. 
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Separate  classes  of  commands  are  defined  by  Maranzano  to  insure 
that  uncontrolled  variability  of  times  within  each  class  will  be  minimized. 
We  are  concerned  here  only  with  the  respective  "load  and  run"  command 
associated  with  each  computing  system  that  directs  the  system  to  load 
the  (previously  compiled)  object  version  of  a  particular  program  and  to 
proceed  with  its  execution.   Since  our  response  time  comparison  is  limited 
to  this  single  command,  no  further  command  classifications  are  required. 

Two  of  the  systems  under  study  (BBN-TENEX  and  UCSD-CANDE)  trace 
and  record  the  interactive  elapsed  time  automatically  and  report  it  to 
the  user  upon  completion  of  a  command  execution.  For  the  other  three 
systems,  the  response  time  was  measured  by  utilizing  system  clocks  in 
various  ways.  The  exact  command  sequence  used  in  each  system  measurement 
is  presented  in  Table  2.k.     The  average  response  time  for  the  execution 
and  printout  of  the  TIME  command  information  was  calculated  in  each 
case  and  accounted  for  in  the  final  determination  of  the  "load  and  run" 
response  time.  AREA,  network  transmission  time  which  is  presently  less 
than  .1  second  in  either  direction  was  not  isolated  in  the  response  time 
determination  (was  recorded  as  part  of  the  individual  system  response 
time).  All  response  time  measurements  were  made  from  a  terminal,  using 
commands  available  to  all  users  of  the  system.  No  special  hardware  or 
software  monitors  were  used. 
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Table  2.k.        Command  Sequence  for  Systems'  Measurement 


SYSTEM 

COMMANDS* 

COMMENTS 

AMES67-TSS 

TIME? 

CALL  PROGRAM 

TIME? 

The  TIME?  command 
returns  the  wall 
clock  time. 

BBN-TENEX 

PROGRAM  NAME 

Response  time  to 
this  run  command  is 
returned  by  the  system 
automatically . 

CCN-TSO 

TIME 

GOCOMPILER  NAME 
TIME 

The  TIME  command  gives 
the  total  connect  time 

MIT-MULTICS 

TIME 

PROGRAM  NAME 
TIME 

TIME  is  a  user  written 
subroutine  that  calls 
and  displays  the  system 
clock  time. 

UCSD-CANDE 

EXECUTE  PROG  NAME 

The  EXECUTE  command 
returns  the  response 
time  automatically 
upon  completion. 

All  the  run  commands  load  (if  it  is  not  already  loaded)  and 
execute  the  object  module  of  the  program. 
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3-   MEASURING  TIME-SHARING  SYSTEMS 

Response  times  were  measured  and  recorded  at  the  various 
observable  load  levels,  for  each  of  the  appropriate  benchmark  jobs,  on 
each  of  the  five  computing  systems.  The  data  was  subsequently  subject 
to  curve-fitting  analysis  in  order  to  formulate  statistically  significant 
quadratic,  cubic  or  exponential  representations  of  the  response  time-load 
level  relationships.  Linear  and  nonlinear  least  squares  regressions 
were  performed. 

The  curve  fitting  was  done  using  a  package  program  authored 
by  J.  A.  Middleton  titled,  "Least- Squares  Estimation  of  Non-Linear 
Parameters--NLIN"  [MH>68]  •  User  subroutines  indicating  the  function  to 
which  the  data  are  to  be  fit  are  called  by  the  main  program  which  then 
iteratively  attempts  to  determine  the  required  variable  coefficients 
(a   and  p  in  the  log-normal  case).  The  algorithm  used  selects  an 
optimized  correction  vector  for  the  coefficients  by  interpolating  between 
the  vector  obtained  by  the  gradient  method  and  that  obtained  by  a 
Taylor's  series  expansion  truncated  after  the  first  derivative.   Iteration 
is  applied  to  this  vector  according  to  the  least  squares  method  of 
estimating  parameters  until  one  of  the  several  stopping  criteria  is  met. 

The  set  of  criteria  used  to  choose  the  curve  that  best  fit  the 
data  included  comparison  of  the  residual  mean  square  of  each  of  the  fits 
(these  are  presented  in  Appendix  C),  consideration  of  the  possible  and 
most  probable  shape  of  the  curve  for  the  time -sharing  system  under 
consideration,  and  special  handling  of  "outlying"  or  obviously  exceptional 
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data  points.  A  discussion  of  the  results  of  the  analysis  for  each  of 
the  computing  systems  under  study  is  presented  below.   The  plots  of  the 
curve  fits  presented  for  each  benchmark  on  each  computing  system  display 
the  best  pair  of  fits  in  each  case  and  indicate  which  of  the  two  fits 
was  finally  chosen. 

Also  included  in  the  discussion  of  the  individual  system 
results  is  a  determination  of  whether  or  not  "saturation"  occurs  within 
any  of  the  observable  load  level  intervals.  Mathematically  speaking,  a 
system  is  said  to  be  saturated  when  the  probability  of  zero  users  waiting 
for  service  becomes  less  than  some  arbitrarily  small  number.  This 
definition  may  be  related  to  a  quadratic,  cubic  or  exponential  response 
time  curve  that  is  relatively  flat  and  then  becomes  concave  upward  by 
determining  the  point  (or  load  level)  at  which  the  slope  of  the  curve 
becomes  greater  than  some  arbitrarily  small  number.  Alternatively,  when 
the  curve  fit  tends  to  have  linear  characteristics  (slow  steady  rising), 
or  in  the  interest  of  relating  saturation  to  the  users'  experience  with 
the  system,  saturation  may  be  defined  as  that  point  or  load  level  in 
which  the  response  time  exceeds  users'  expectations  of  waiting  time. 
For  the  types  of  benchmarks  and  systems  involved  in  this  study,  except 
for  BBN-TENEX,  two  minutes  was  taken  as  a  reasonable  time  span  within 
which  to  expect  job  completion.   Not  all  systems  exhibit  definite 
saturation  characteristics  within  the  observable  range  of  the  data. 
A  summary  table  of  saturation  levels  in  each  of  the  systems  is  presented 
in  Table  3«1>   The  information  given  there  is  explained  more  fully  in  the 
discussions  of  individual  systems  that  follows.   The  processing  times 
required  by  each  benchmark  in  each  system  are  presented  in  Table  3«2. 
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Table  3«1«   Systems'  Saturation  Level 


SATURATION 

SATURATION 

SYSTEM 

CRITERIA 

LOAD  LEVEL 

COMMENTS 

AMES-TSS 

Response  time  rises 

10th 

Only  a  gradual,  steady 

above  120  seconds. 

or  above 

increase  in  response 
times  occurs. 

BBN-TENEX 

Sharp  rise  in 

3rd 

System  response  times 

response  time  curve 

tend  to  be  of  the  mag- 

combined with 

nitude  of  batch  pro- 

excessively high 

cessing  times  rather 

response  times 

than  interactive 

"(about  5  minutes). 

processing  times. 

CCN-TSO 

Fairly  sharp  rise 

above 

System  is  generally 

in  response  time 

10th 

lightly  loaded  and 

curve  combined  with 

saturation  did  not 

rise  of  response 

occur  in  the  observ- 

time above  120 

able  range  of  the 

seconds. 

data. 

MIT-MULTICS 

Extremely  sharp 

8th 

System  response  times 

rise  in  response 

conform  most  closely 

time  curve  com- 

to popular  response 

bined  with  rise 

time  expectations. 

of  response  time 

above  120  seconds. 

UCSD-CANDE 

Response  time  rises 

8th 

Only  a  gradual,  steady 

above  120  seconds. 

or  above 

increase  in  response 

time  occurs. 
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Table  3.2.   Average  BenchmarK  Processing  Times* 


NUMBER 
CRUNCHER 

BIT 
MANIPULATOR 



FILE 
FLOGGER 

AMES-TSS 

21 

16 

7 

BB'N-TENEX 

63 

NA 

NA 

CCN-TSO 

6 

5 

OJ 

MIT-MJLTIC3 

^5 

2 

3 

UCSD-CANDE 

57 

NA 

NA 

*A11  processing  times  are  given  in  seconds.   NA,  not  applicable, 
indicates  that  the  benchmark  was  not  run  at  the  computing 
center  in  question. 
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3.1.   Analysis  of  Individual  System  Data 
3-1.1.   AMES-TSS 

When  a  dispatching  algorithm  for  time-sharing  systems  assigns 
processes  to  one  of  a  set  of  increasingly  lower  priority  queues  depending 
mainly  on  the  processes'  former  behavior  in  using  its  alloted  time-slice 
(like  BBN-TENEX),  the  response  time  curve  for  that  system  is  generally 
almost  constant  for  a  lightly  loaded  system  and  "begins  to  rise  rapidly 
when  the  load  increases  beyond  some  critical  point.   On  the  other  hand, 
when  some  other  criteria  is  the  main  factor  in  determining  queue  position, 
such  as  the  amount  a  user  is  willing  to  pay,  or  the  paging  behavior  of  the 
process,  then  the  response  time  curve  appears  to  rise  slowly  as  the  load 
increases,  in  a  strictly  linear  fashion.  AMES-TSS  is  one  such  system. 

As  described  earlier,  core  usage  characteristics  of  a  process 
are  the  main  factor  in  determining  queue  position  at  AMES.   This  implies 
that  a  process  with  a  small  working  set  size  and  good  locality  will 
stay  in  the  top  priority  queues,  regardless  of  how  much  service  it  is 
requiring  of  the  CPU.  The  response  time  of  a  process  can  increase 
linearly  with  increased  load,  therefore,  and  not  necessarily  exhibit 
a  sharp  rise  at  some  critical  saturation  point. 

This  phenomenon  can  be  observed  in  the  response  time  curves  for 
all  three  of  the  benchmark  jobs  run  at  AMES,  shown  in  Figures  3«l(a) 
through  3.1(c).  Although  the  exponential,  quadratic  and  cubic  curves 
were  chosen  as  best  fits  for  the  arithmetic,  bit  string  manipulator  and 
i/O  bound  benchmarks,  respectively,  within  the  observable  load  span  all 
three  curves  rise  slowly  but  steadily,  in  an  almost  linear  fashion. 
Because  of  the  linear  shapes  of  the  curves,  saturation  in  this  system 
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Figure  3.1(a).   Statistical  Results  -  AMES-TSS 
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*See  Table  2.3.  for  the  correspondence  between  Load  Levels  1-10  and 
the  system  measure  of  busyness  for  each  of  the  five  systems  studied 
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Figure  3.1(b)    (continued).        Statistical  Results  -  AMES-TSS 
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Figure  3.1(c)  (continued).   Statistical  Results  - 


AMES-TSS 
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must  "be  considered  as  occurring  in  the  load  level  in  which  response 
time  rises  above  120  seconds.  This  rise  does  not  occur  within  the 
observable  range  of  the  data  except  for  the  i/o  bound  benchmark,  in 

which  case  the  curve  barely  climbs  above  two  minutes  between  the  9 

th 
and  10   load  levels.   The  relatively  long  response  time  for  this 

benchmark  takes  on  significance  in  view  of  the  fact  that  the  i/o 

bound  job  required  less  execution  time  by  a  factor  of  1:2  as  compared 

with  the  bit  string  manipulator  and  1:3  as  compared  with  the  arithmetic! 

benchmark  job. 

Since  "pi",  a  measure  of  core  contention,  was  used  as  the 
load  measure  in  this  system,  the  question  arises  as  to  whether  using 
number  of  users  as  load  measure  would  yield  different  results.  Number 
of  users  is  an  undesirable  measure  for  load  in  the  AMES  system  because 
of  the  tendency  for  local  users  to  stay  logged  in  for  long  periods  of 
time,  regardless  of  whether  or  not  they  are  doing  useful  work.  The 
response  time  data  were  plotted  against  number  of  users,  however,  and 
linear  curves  similar  to  the  ones  already  displayed  resulted'  as  best 
fits.  But,  as  can  be  observed  from  Table  3*3,  the  residual  mean  squares 
(RMS)  were  larger  in  every  case  for  these  plots  as  compared  to  the 
response  time  versus  pi  plots. 

The  data  collected  on  the  AMES-TSS  system  is  complete,  even 
though  few  valid  observations  were  recorded  at  the  9   and  10   load 
levels,  in  the  sense  that  AMES  has  adjusted  their  overall  scheduling 
scheme  such  that  the  value  for  pi  very  seldom  goes  below  0.2.  A  new 
"Resource  Allocation  Scheme"  attempts  to  guarantee  some  level  of  service 
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Table  3.3.   Residual  Mean  Squares  for  AMES-TSS  Curve  Fits 


Benchmark 

Residual  Mean  Square  (RMS) 

Using  "no.  of  users" 
-vs- 
response  time 

Using  "pi" 
-vs- 
response  time 

Number  Cruncher 

1.09  (103) 

2.76  (102) 

Bit  Manipulator 

2.25  (io2) 

1.U5  (102) 

File  Flogger 

1.05  (103) 

1.03  (103) 

to  authorized  priority  users  at  various  times  of  the  day,  e.g.,  group  1 
receives  top  priority  between  8  a.m.  and  10  a.m.,  group  2  from  10  a.m. 
to  12  noon  and  so  on.  The  data,  therefore,  represents  observations 
over  all  the  load  levels  that  AMES-TSS  will  assume  in  its  present 
configuration. 
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3-1.2.   BBN-TENEX 

Even  the  novice  user  of  the  TENEX  system  at  BBN  quickly  forms 
the  impression  that  for  a  fairly  good  sized  job,  even  with  light  loads, 
response  times  are  slow  and  they  tend  to  increase  very  rapidly.   The 
exponential  curve  shown  in  Figure  3 .2,  chosen  as  the  "best  fit  to  the 
BBN  number  crunching  benchmark  data,  readily  verifies  this  impression 
(as  explained  in  Section  2.2.2.,  the  bit  manipulating  and  the  file  flogging 
benchmarks  were  not  run  on  this  system) .   The  data  range  is  the  largest 
of  all  the  systems  studied,  rising  to  a  measured  turnaround  time  of  more 
than  one  hour  at  the  tenth  load  level.  The  slope  of  the  curve  rises 
relatively  rapidly,  making  a  saturation  point  difficult  to  define .  Only 
measurements  at  the  lowest  load  level  were  consistently  under  120  seconds. 
The  BBN  system  response  actually  hovers  between  time- sharing  and  batch 
expectations.   The  exponential  curve  fit  reflects  the  success  with  which 
the  philosophy  of  the  TENEX  dispatching  algorithm  (which  predicts 
approximately  exponential  response  times)  is  implemented  in  the  total 
BBN- TENEX  system. 

Of  special  interest  in  this  system  is  the  fact  that  the  load 
measure  is  not  number  of  users  as  it  was  in  the  majority  of  systems,  but 
is  the  quantity  defined  as  "load  average"  in  the  earlier  description 
of  the  TENEX  system.  With  this  quantity  as  independent  variable,  the  BBN 
data  yield  the  best  regression  fit  of  any  other  set  of  data.  The  ratio 
of  regression  sum  of  squares  to  total  sum  of  squares  is  a  satisfyingly 
high  0.872  (see  Appendix  C). 
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Figure  3-2.   Statistical  Results  - 


BBN-TENEX 
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The  BBN  data  can  "be  accepted  as  complete  in  the  sense  that 
the  observable  range  of  load  levels  shows  the  interactive  response  time 
for  the  arithmetic  benchmark  rising  above  an  intolerably  high  one  hour. 
A  user  searching  for  a  time- sharing  system  on  which  to  run  a  job  would 
surely  reject  the  BBN-TENEX  option  (except  for  some  extenuating  circum- 
stances such  as  free  computing)  when  the  load  average  rose  above  about 

rd 
14.0  as  it  does  in  the  3   load  interval. 

3.1.3.   CCN-TSO 

The  CCN-TSO  system  is  not  often  heavily  loaded,  with  thirteen 
users  being  the  maximum  load  observed  during  this  study.  Moreover,  the 
processor  is  a  powerful  one  and  in  the  context  of  TSO's  particular 
dispatching  algorithm,  the  CCN  system  required  only  6  seconds  of  execution 
time  to  execute  the  arithmetic  benchmark.  This  was  a  performance  improve- 
ment of  more  than  3:1  over  the  next  fastest  system  (AMES-TSS)  and  of  more 
than  10:1  over  the  slowest  system  (BBN-TENEX) .  Further,  since  within  the 
entire  CCN  computing  system  the  TSO  system  is  guaranteed  a  portion  of  - 
CPU  service,  but  not  a  portion  of  i/O  service,  i/o  interactions  of  a  process 
become  the  dominating  factor  in  determining  response  time.  This  becomes 
evident  upon  examination  of  Figures  3«3(a)  through  3-3(c)  in  which  response 
times  for  the  i/o  benchmark  is  almost  double  that  for  either  the  arithmetic 
or  bit  manipulating  benchmark,  even  though  the  i/o  benchmark  requires  less 
than  half  the  processing  time  of  either  of  the  latter  two. 

Both  the  arithmetic  and  bit  manipulating  benchmark  sets  of  data 
suggest  that  the  CCN-TSO  system  has  not  reached  saturation  within  the 
observable  range.  Both  curves  are  very  slowly  rising  and  stay  below  120 
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Figure  3.3(a).   Statistical  Results  -  CCN-TSO 
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Figure  3.3(b)  (continued).   Statistical  Results  -  CCN-TSO 
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Figure  3.3(c)  (continued).   Statistical 


Results  -  CCN-TSO 
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th 
seconds  even  in  the  10   load  interval.  CCN  personnel  estimate  that 

their  system  will  saturate  with  about  twenty  users  and  the  data  suggests 
that  this  intuition  may  be  valid.  These  two  benchmarks  require  approximately 
the  same  amount  of  processing  time  (about  five  seconds)  and  their  response 
time  curves  are  similar. 

The  i/o  benchmark  response  time  curve  is  effectively  linear, 
rising  steadily  as  the  load  increases.  An  exponential-like  quick  rise 
is  not  observed  in  this  case  because  it  is  the  i/o  service  and  not  the 
processor  service  that  is  causing  the  increased  waiting  time.  This  bench- 
mark required  only  two  seconds  of  processing,  so  it  did  not  descend  through 
the  priority  dispatching  queues.  Rather,  it  spent  time  waiting  as  a 
result  of  increased  competition  with  all  other  TSO  and  total  CCN  system 
jobs  for  limited  i/o  resources.   This  wait  time  grows  linearly  as  the  load 

increases,  and  has  a  high  degree  of  variability  as  is  seen  by  observing 

th  th 

the  actual  data  point  values  in  the  7   through  10   load  intervals. 

3-1.4.   MIT-MULTICS 

The  MIT-MULTICS  data  as  shown  in  Figures  3.4(a)  through  3.4(c) 
conforms  most  closely  to  the  popular  conception  of  expected  response  time 
from  a  time- sharing  system.  Considering  the  arithmetic  benchmark  plot, 
the  exponential  curve  chosen  as  the  best  fit  is  almost  constant  (and 
below  120  seconds)  until  approximately  the  8   load  level.  Between 
the  8   and  9   load  levels,  the  curve  shoots  up  extremely  sharply, 
clearly  indicating  a  saturated  system.  The  combination  of  a  fairly 
fast  processor  and  a  scheduling  algorithm  that  relies  very  heavily  on 
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Figure  3.U.        Statistical  Results   -  MIT-MULTICS 
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Figure  3-M"b)  (continued).   Statistical  Results  -  MIT -MULT  ICS 
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Figure  3-^(c)    (continued).        Statistical  Results   - 


MIT-MULTICS 
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previous  time-slice  usage  and  a  series  of  priority  dispatching  queues 
work  together  to  achieve  this  expected  behavior.  The  approximately  ^5 
seconds  of  required  execution  time  allow  the  benchmark  job  to  remain  in 
the  system  long  enough  to  keep  using  up  its  formerly  alloted  time-slice 
and  descend  through  the  priority  queues.  Position  on  a  low  priority 
queue  is  of  no  significance  until  the  probability  that  there  are 
processes  waiting  for  service  becomes  greater  than  some  arbitrarily  small 
number.  This  happens  at  the  8   load  level. 

The  other  two  benchmarks  run  at  MIT  required  only  about  2  seconds 
of  processing  each  and  so  were  not  caught  up  in  the  descending  queue 
phenomenon.  They  received  excellent  response  times  regardless  of  the 
load  level. 

3.1.5-   UCSD-CANDE 

The  response  time  load  level  curve  (Figure  3*5)  is  linear  for 
UCSD-CANDE  as  it  was  for  AMES-TSS,  but  for  different  reasons.   (Recall, 
only  one  benchmark  was  run  at  UCSD.)  UCSD  has  only  two  priority  queues 
for  its  interactive  programs,  the  lower  priority  queue  for  processes  which 
exceed  their  previous  time-slice  and  the  higher  priority  queue  of  all 
other  ready  jobs.  All  processes  are  served  FIFO  from  both  queues,  so  that 
except  for  the  possible  interruption  of  high  priority  processes,  even 
jobs  requiring  long  processor  service  times  are  served  approximately 
round  robin  (RR)  until  completion.  Response  time  grows  linearly  with 
load,  therefore,  rather  than  exponentially.  For  the  arithmetic  benchmark 
which  required  57  seconds  of  processing  time  on  the  average,  the  response 
time  rises  to  less  than  three  times  the  execution  time  within  the 


65 


Figure  3-5.   Statistical  Results  - 


UCSD-CANDE 
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observable  range  of  the  data.  The  curve  rises  above  120  seconds  at 

th 
approximately  the  8   load  level,  but  given  the  average  performance 

ratio  of  better  than  3:1  of  total  response  time  to  required  execution 

time,  a  more  heavily  loaded  system  needs  to  be  observed  in  order  to 

more  accurately  pinpoint  a  saturation  level,  if  one  exists. 


3.2 .   Comparison  of  Computing  Systems 

One  of  the  major  goals  of  this  study  of  the  response  times  on 
various  time- sharing  systems  on  the  ARPA  network  was  the  comparison  of 
system  performance .  Each  of  three  benchmarks  was  run  on  from  three  to 
five  different  systems,  with  response  time  measurements  being  made  at 
varying  load  levels.  The  arithmetic  benchmark  job  was  run  on  all  five 
of  the  systems  under  study.  The  bit  string  manipulating  and  i/o  bound 
benchmark  jobs  were  run  at  AMES,  CCN  and  MIT  only.  The  load  levels  are 

equivalent  (and  hence  comparable)  in  the  sense  that  each  i   load  level 

th 
represents  the  i   (approximately)  uniformly  distributed  load  interval- 
over  the  range  of  the  observable  data  for  a  particular  system.  Reference 
should  be  made  to  section  2.2.3.  for  the  precise  load  level  definitions 
on  each  system. 

3.2.1.   Arithmetic  Benchmark 

Comparison  plots  for  the  arithmetic  benchmark  job  are  presented 
in  Figures  3«6(a)  through  3«6(e).  Curves  shown  in  these  figures  are  those 
determined  to  be  the  best  fit  in  the  individual  system  analyses.  The 
BBN-TENEX  response  time  curve  dwarfs  all  other  systems  in  comparison  as 
Figure  3«6(a)  illustrates.  Reducing  the  dependent  variable  scale  by  a 
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Figure  3.6(a).   Arithmetic  Benchmark  Comparisons 
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*See  Table  2.3.  for  the  correspondence  between  Load  Levels  1-10  and 
the  system  measure  of  busyness  for  each  of  the  five  systems  studied 
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Figure  3«6(b)    (continued). 


Arithmetic  Benchmark  Comparisons 
(Without  BBN-TENEX) 
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Figure  3 .6(c)  (continued).   Arithmetic  Benchmark  Comparisons 

(With  9%   Confidence  Intervals) 
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Figure   3«6(d)    (continued).       Arithmetic  Benchmark  Comparisons 

(With  95$  Confidence  Intervals) 
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Figure  3-6(e)  (continued).   Arithmetic  Benchmark  Comparisons 

(With  95$  Confidence  Intervals) 
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factor  of  more  than  five  as  was  done  in  Figure  3«6(b)  brings  the  comparison 
of  the  other  systems  into  better  perspective.  Figures  3«6(c)  through 
3.6(e)  show  the  95  percent  nonlinear  confidence  intervals  for  the  four 
curves  presented  in  Figure  3 -6(b). 

The  CCN-TSO,  AMES-TSS  and  MIT-MULTICS  systems  give  very  nearly 

st         th 
equivalent  response  times  in  the  1   through  7   load  intervals.   In  the 

8   interval,  MIT  becomes  saturated  and  response  time  in  that  system 

rises  sharply,  while  CCN  and  AMES  continue  to  give  comparable  good 

response  time  throughout  the  entire  observable  range.  An  important 

consideration  in  these  observations  is  that  AMES  and  MIT  are  producing 

favorable  response  time  data  over  the  entire  range  of  usage  in  those 

systems,  while  the  CCN  data,  though  favorable,  was  collected  on  only  a 

lightly  loaded  system. 

The  UCSD-CANDE  system,  while  giving  quite  acceptable  response 
times,  is  generally  out  performed  by  all  systems  except  BBN-TENEX. 
The  UCSD  system  reacts  to  saturation  less  radically  than  does  the  MIT  - 
system,  however,  and  performance  is  better  at  UCSD  than  at  MIT  in  the  9 
and  10   load  intervals . 

If  a  strict  ranking  were  required,  from  fastest  to  slowest 
systems  in  terms  of  response  time  curves  for  the  type  of  processing 
inherent  in  the  arithmetic  benchmark  job,  it  would  be  given  as  CCN-TSO, 
AMES-TSS,  MIT-MULTICS,  UCSD-CANDE  and  BBN-TENEX.   Such  a  ranking,  though, 
must  be  considered  in  the  context  of  how  significant  the  difference 
between  any  two  particular  systems  really  is. 


73 

3.2.2.  Bit  Manipulating  Benchmark 

The  bit  string  manipulating  "benchmark  was  run  on  the  three 
systems  that  supported  the  PL/ I  programming  languages:  AMES,  CCN  and  MIT. 
Figures  3«7(a)  and  3«7(h)  present  the  comparative  response  time  results 
for  this  highly  CPU  bound  benchmark.  The  MIT-MULTICS  system  required 
only  two  seconds  of  execution  time  on  the  average  to  complete  the  task 
and  clearly  out  performs  the  AMES  and  CCN  systems  in  terms  of  response 
times.  Even  the  95  percent  confidence  interval  is  very  tight  and  evidences 

the  MULTICS  superiority.  As  Figure  3.7  indicates,  the  AMES  and  CCN 

th 
curves  intersect  in  the  k       load  interval  at  which  point  the  advantage 

switches  from  CCN  to  AMES.   The  AMES  95  percent  confidence  interval  is 

smaller  than  that  of  CCN,  however,  and  indicates  that  of  AMES  and  CCN, 

AMES  generally  gives  the  faster  response  time.   This  is  true  in  spite  of 

the  fact  that  in  the  AMES  system  the  benchmark  requires  more  than  three 

times  (l6  seconds)  the  execution  time  of  the  CCN  system  (5  seconds). 

For  a  completely  CPU  bound  job  of  only  moderate  length  requiring 

no  signficant  amount  of  core  and  doing  no  significant  amount  of  i/o,  the 

ranking  of  systems  from  fastest  to  slowest  is  MIT-MULTICS,  AMES-TSS  and 

CCN-TSO. 

3.2.3.  I/O  Bound  Benchmark 

The  file  flogging  benchmark  was  run  on  the  same  systems  as  the 
bit  string  manipulating  benchmark.  Figures  3 .8(a)  and  3 .8(b)  demonstrate 
that  MIT-MULTICS  again  gives  the  best  response  time  performance,  with 
AMES-TSS  clearly  second  and  CCN-TSO  third.  The  CCN  system  acknowledges 
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Figure  3.7(a).       Bit  String  Benchmark  Comparisons 
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Figure  3-7(1))  (continued).   Bit  String  Benchmark  Comparisons 

(With  9%   Confidence  Intervals) 
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Figure  3 -8(a).        i/o  Bound  Benchmark  Comparisons 
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Figure   3.8(b)    (continued).        i/o  Bound  Benchmark  Comparisons 

(With  9%  Confidence   Intervals) 
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that  its  I/O  resources  are  those  most  likely  to  "become  bottlenecked. 
The  wide  variability  in  the  CCN  95  percent  confidence  interval  is 
evidence  of  the  processes  outside  of  TSO  control  that  also  compete  for 
the  i/O  resources. 
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k.        MODELING  TIME-SHARING  SYSTEMS 

The  system  comparison  data  presented  thus  far  is  useful  in 
evaluating  the  performance  of  various  time-sharing  systems  in  reference 
to  a  given  set  of  computing  applications  (benchmark  jobs)  which  require 
a  given  amount  of  actual  processing  time.   In  order  to  compare  and  predict 
turnaround  time  for  a  wider  class  of  jobs,  however,  a  system  model  is 
desired  which  accepts  the  processing  time  of  a  job  as  an  independent 
variable  rather  than  as  an  implied  constant. 

The  approach  used  in  this  investigation  is  to  develop  an  analytical 
and/or  simulation  model  to  describe  the  behavior  of  the  various  time- 
sharing systems  under  study  as  they  process  the  number  crunching  benchmark 
job.  These  more  general  models  are  tuned  to  approximate  as  closely  as 
possible  the  behavior  of  the  already  developed  statistical  models  describing 
the  respective  systems.  The  tuned  models,  depending  on  the  success  with 
which  they  are  able  to  describe  system  behavior,  may  then  be  used  in  place 
of  the  statistical  models  to  predict  job  response  time  for  similar  job 
applications  but  for  jobs  requiring  any  amount  of  processing  time.   In 
addition,  the  effect  of  network  delays  (which  was  not  a  factor  in  the 
statistical  model)  is  introduced  into  the  analytical  and/or  simulation 
models  to  more  completely  predict  job  response  time. 

*4  .1.   An  Analytical  Model  for  Time-Sharing  Systems 

During  the  late  1960's,  analytical  modeling  of  time-sharing 
systems  with  various  scheduling  disciplines  resulted  in  a  wide  range  of 
useful  system  models.  A  thorough  survey  of  such  models  is  presented  by 
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L.  Kleinrock  [KLE72] .  Basically,  the  systems  are  studied  by  considering 
priority  disciplines  operating  in  a  stochastic  queueing  environment. 
The  essential  elements  of  such  systems  include  the  source  from  which  jobs 
emanate  for  service,  the  input  process,  the  service  process,  the  number 
of  servers  and  the  service  discipline.  Many  variations  exist  within 
and  among  these  elements,  providing  a  wide  choice  of  model  designs. 

Some  design  parameters  are  strongly  recommended  for  ease  of 
model  analysis,  such  as  Markov  assumptions  for  the  arrival  and  service 
processes.   Other  design  options  such  as  a  particular  queue  discipline 
can  be  more  closely  matched  with  the  actual  system  that  is  being  modeled. 
Below  is  a  list  of  the  set  of  design  options  that  completely  define  the 
analytical  model  used  to  represent  the  time-sharing  systems  under  study. 
Except  for  the  AMES-TSS  system,  all  the  systems  dispatch  processes 
through  a  set  of  priority  queues,  each  of  which  has  its  own  associated 
time-slice. 

Source:  The  source  was  assumed  to  be  an  infinite  one. 

The  load  of  a  system  is  equated  with  the  number 
of  job  arrivals  emanating  from  this  source.  This 
assumption  is  not  a  completely  accurate  one  since 
the  load  on  a  time- sharing  system  is  often  limited 
by  the  number  of  terminals  with  system  access 
capability.   Scherr  [SCH67]  has  developed  a  model 
based  on  a  finite  source . 

Input  Process:       The  input  process  is  assumed  to  be  the  Poisson 

process  and  is  described  by  an  interarrival  time 
distribution  denoted  by  A(t) .  A(t)  is  defined 
by  the  exponential  distribution 

A(t)  =  1  -  e"xt    t  >  0,  X  >  0 
0  t  <  0,  \  >   0. 
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The  mean  arrival  rate  is  than  l/X  seconds .  The 
interarrival  times  form  a  sequence  of  independent 
and  identically  distributed  random  variables. 


Service  Process: 


The  service  process  is  also  assumed  to  be  exponential 
and  is  defined  by 


B(t)  =  1  - 


•JUT 


0 


T  >  0,  U  >   o 

T  <  0,  M  >  0, 


The  mean  service  time  is  l/ju  seconds.  The  service 
times  are  also  independent  and  identically  distri- 
buted.  In  a  measurement  study  by  Fuchs  and 
Jackson  [FUC70],  a  significant  result  showed  that 
for  all  continuous  random  variables  studied,  the 
gamma  distribution  was  an  excellent  fit.  Because  of 
the  close  relationship  of  the  gamma  and  exponential 
distributions,  analytical  models  studied  under  the 
assumption  of  exponential  distributions  may  not  be 
far  from  the  truth. 


Number  of  Servers: 


The  number  of  servers  is  1.   The  standard  notation 
used  to  describe  the  model  thus  far  is  M/M/l,  where 
the  first  and  second  parameters  indicate  the 
exponential  distribution  for  the  input  and  service 
process,  respectively,  the  third  indicates  one  server 
and  the  lack  of  a  fourth  indicates  an  infinite 
source. 


Service  Discipline 


The  service  discipline  is  quantum  controlled  with  a 
variable  quantum  size,  FB„ 
.th 


V 


FIFO,  preemptive  resume 
Each 


in  the  N~"  queue  and  having  zero  swap  time 
of  these  options  is  discussed  separately. 


Quantum  Controlled:  Each  process  receives  a 
maximum  service  time  from  the  service  facility 
equal  to  the  quantum  q.  associated  with  its 

particular  queue.  Different  quantum  sizes  may  be 
associated  with  different  queues,  but  the  variability 
is  limited  to  a  linear  function  of  some  constant 
quantum. 


FB. 


N' 


If  a  job  has  not  completed  processing  during 


its  quantum,  it  returns  to  the  system  at  the  end 
of  the  next  lower  priority  queue .  There  are  N  such 
queues.  Units  at  the  N   level  are  served  a  quantum 


q  at  a  time  in  turn  until  completion . 


That  is,  an 
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N   level  process  will  be  preempted  "by  a  higher 
level  process  if  one  exists,  or  by  another  N^*1 
level  process  if  one  exists,  after  it  has  completed 
the  quantum- service  in  progress. 

FIFO:   The  service  is  first-in,  first-out  within 
queue  s . 

Swap  Time:   The  time  required  to  swap  a  process  in 
and  out  of  the  memory  is  assumed  to  be  absorbed  in 
the  process'  required  service  time.   The  swap  time 
is  thus  considered  to  be  zero. 

Of  the  many  time -shared  models  presented  in  the  literature,  two 

meet  almost  all  of  the  specifications  listed  above.  Wolff  [WOL68]  analyzes 

a  model  identical  to  the  one  described  except  that  it  is  FB  rather  than 

00 

FB„.  Jobs  are  permitted  to  descend  through  an  infinite  number  of  priority 

queues  before  completing  processing.  Coffman  and  Kleinrock  [COF68] 

present  a  model  identical  to  the  one  described  except  that  it  does  not 

provide  for  variable  quantum  sizes.  A  modification  of  the  Coffman- 

Kleinrock  model  extends  its  application  to  include  a  limited  use  of 

variable  quantum  sizes. 

In  addition  to  the  arrival  and  service  time  definitions  already 

given,  the  following  notation  will  be  used: 

5   =  a  constant  fractional  amount  of  time  allocated 
to  a  job  on  each  pass  through  the  system 

q.  =  the  amount  of  time  allocated  to  a  job  on  its  i 

pass  through  the  system,  i  =  1,2,  .  .  .  We  require 
that  5  <  q.  and  q.  =  m.8 ,  i  =  1,2,  .  .  .  ,  and  m. 

is  an  integer. 

Q.  =  the  total  time  allocated  to  a  job  on  its  first  j 
passes: 

J 
CL  =  £  <L   J  =  1,2,  .  .  . 
3       i=l 
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Coffman  and  Kleinrock  derive  the  expected  response  time  for  a 
job  that  requires  t  seconds  of  processing.   The  result  depends  on  a 
parameter  k,   where  k  is  the  smallest  integer  such  that  k  5  >  t.   Since 
the  derivation  depends  on  the  integral  property  of  k  and  since  the  q. 
were  defined  to  map  into  an  integral  multiple  of  5,  the  model  can  be 
adjusted  to  accommodate  variable  q. . 

For  a  job  requiring  t  seconds  of  service  in  the  FB„  system  with 
fixed  quanta  of  length  5,  the  expected  waiting  time  in  the  system  as 
derived  by  Coffman  and  Kleinrock  is 


U/2  [E.(t=)  +7]r   E-(tC)] 
w  (t)  = £ [1 — i 

k      [1  -  p  (1  -  e"^-5)]  [1  -  p  (1  -  e~^k-1)Q)] 

p  (l-e^"1^) 

-iu(k-l)5v   (k-l)o  +  t       1  <  k  <  N-l 


1  -  p  (1  - 


c 


W,  (t) 


P(l//i 


k     (l  -  P)!l  -  P  (l  -  t-'^N-1)5) 


p  (1  -  e-MCH-DB) 


1  -  P  ( 


1    _    p-„(N-l)B) 


(k-l)B  +  t      k  >  N 


where  k  is  the  smallest  integer  such  that  k8  >  t,  where  we  define 
(t  )  as  the  second  moment  of  the  distribution 


*V(t)  = 


84 


0,  T  <  0 

1  -  e~MT,   0  <  T  <  ks 

1,  t  >  kS 


with 


EJT)  =  ^  [1  -  e^*8], 


\(t  )  =  —  -  ^-2—  [(/iksT  +   2^/kS  +  2] 

M       Id 


and  where 


7    e'M 
k  = 


1  -  e^5 


and 

P  =  "Nju 

where  p  is  a  measure  of  system  utilization. 

Now  since  q.  >  5  and  q.  =  m.  5  for  all  i  and  m.  an  integer, 
the  number  of  5s  required  to  service  a  job  can  be  partitioned  into  A. 
subsets  in  such  a  way  that  there  exists  a  unique  mapping  between  the 
A. 's  and  the  q. 's.  Let  the  partitions  of  the  5s  be  defined  by  sets 
A.  =  k.S,  with  k.  an  integer.   If  a  process  requires  t  seconds  of 
service  time  with  k  the  smallest  integer  such  that  k  5  >  t  and  m  the 
smallest  integer  such  Q  >  t,  then  partition  the  k  5s  into  m  subsets, 
each  representing  a  sum  of  5s,  such  that 


kt5  =  Ai  =  q.±  1  <  ±  <   m-1  (3) 
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and 


-<\-  (h) 


m     m 


Each  q  is  now  associated  with  a  particular  k  . 
x  i 

There  also  exists  a  mapping  from  the  number  of  priority  queues, 

N,  in  the  Coffman-Kleinrock  model  into  the  number  of  priority  queues,  Nf, 

in  the  modified  model.   N'  is  associated  with  some  q  ,.   Define 
N' 

N=   E  q,/5- 
i=l  1 


Returning  to  the  Coffman-Kleinrock  model,  let  the  values 

that  k  assumes  in  equations  (l)  and  (2),  instead  of  being  any  integer, 

be  only  those  integers  0 .  for  which 

J 

P .  =     I  k.  for  1  <  j  <  m   .  (5) 

J   i=l  X       "   " 


Then    can  be  substituted  for  k  in  those  equations  since  I      is  the 

m  m 

smallest  of  the  I.   integers  such  that  H.   s  >  t. 
J  J 

As  an  example  of  this  type  of  mapping,  consider  t  =  .93, 
5  =  .1,  q.  -  2    5  and  N'  =  k.     Clearly,  q.  <  F>  and  q.  =  m.  5  for  all 
i  and  m.  an  integer.  Further,  for  k  =  10,  k  is  the  smallest  integer 
such  that  k  ft  =  10*  .1  >  t  =  .93  and  for  m  =  k,   m  is  the  smallest 
integer  such  that  0  _  =  q  +  q^  +   q  +  q}  =  .1  +  .2  +  .k   +  .8  >  t  =  .93- 
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Now 


^  =  l*.l  = 

■1=  4l 

J 

kl  = 

1, 

Ag  =  2*.l  = 

,2--qg 

J 

k2  = 

2, 

A3  -  1...1  = 

•u=,3 

J 

k3  = 

>*, 

AU  =  3*.l  = 

•3  <  1u 

> 

kU  = 

3 

and  «1=1,  i2=3,  jg  =T,  4^=10. 

This  mapping  changes  the  way  the  system  is  conceptualized  in 
a  greater  degree  than  it  changes  the  way  the  system  actually  works. 
Figure  4.1  illustrates  this  change  for  k.=4-.  Assuming  the  job  requires 
at  least  k   service  quantums  before  completion  when  it  arrives,  a  job 
passing  through  the  Coffman-Kleinrock  system  receives  k   short  bursts  of 
service,  each  time  taking  its  place  on  the  next  lower  priority  queue  and 
waiting  for  jobs  of  higher  priority  to  be  processed  first.  A  job  passes 
through  the  modified  system  in  one  service  burst,  after  having  waited  for 
all  jobs  queued  at  that  priority  level  to  use  their  required  service 
quantum  of  up  to  k.      The  restriction  on  the  choice  of  k  divides  the  first 
type  (Coffman-Kleinrock  model)  of  system  into  several  'black  boxes"  each  of 
which  represents  an  equivalent  service  quantum  available  in  the  second 
system. 
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Figure  k.l.       Comparison  of  Two  Models 


COFFMAN-KLEINROCK     MODEL 


MODIFIED     MODEL 


ARRIVAL 


48 


i  th  PRIORITY  LEVEL 


jth  THROUGH  (j+4)th 
PRIORITY  LEVELS 


In  order  to  see  how  the  restriction  on  the  choice  of  k  effects 
the  expected  waiting  time  results,  we  consider  a  tagged  job  arriving  at 
the  FBN  system  in  equilibrium,  assuming  that  its  service  requirement  is 
t  seconds  and  that  k  is  the  smallest  integer  such  that  k  5  >  t,  and 
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rn 


the  smallest  integer  such  that  Q  >  t.  The  system  must  be  divided 

into  two  disjoint  subsystems  to  derive  the  modified  system  equations. 

We  will  first  examine  the  progress  of  the  tagged  job  for  its  first  Z     n 

m-1 

passes  through  the  system,  and  then  consider  the  tagged  job's  I         pass 

through  the  system  separately. 

We  have  defined  A.  subsets,  1  <  i  <  m,  to  partition  the 

5-quantums  required  to  service  a  job.  We  now  consider  the  waiting  time 

in  queue  of  the  tagged  job  as  it  passes  through  a  A.  subset  of  quanta  for 

any  i  <  m.  We  will  define  this  waiting  time  as  W.,  where 


W.  =  W  .  (t)  -  W-    (t)     i  <  m  •  (6) 

i       i-1 


Assuming  that  the  units  in  all  queues  of  priority  higher  than  i  have  been 
processed,  in  the  modified  system  the  waiting  time  of  the  tagged  job 
is  effected  only  by  those  jobs  which  are  ahead  of  it  in  the  i   queue. 
These  jobs  will  receive  their  q.  quantum  of  service  under  a  strictly 
FIFO  discipline,  and  then  the  tagged  job  will  receive  its  q.  quantum  of 
service,  completely  independent  of  jobs  which  have  arrived  during  the 
waiting  interval  of  the  tagged  job  on  queue  i.  This  is  not  the  case  in 
the  Coffman-Kleinrock  system. 

Still  working  under  the  assumption  that  the  units  in  all  higher 
priority  queues  have  been  processed,  and  also  k.  >  1,  in  the  Coffman- 
Kleinrock  system  a  tagged  job's  total  waiting  time  in  the  j  8-quantum 
queues,  £,.    ,+1  <  j  <  I  ,    is  dependent  upon  new  arrivals  that  occur 
during  the  tagged  unit's  waiting  time.  This  is  so  because  these  new 
arrivals  will  start  to  receive  5  quanta  of  processing  time  before  the 
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tagged  job  has  received  its  total  j  5 -quantum  service  slices.   If  no  new 
arrivals  occurred  during  the  tagged  job's  waiting  time,  the  tagged  job 
would  experience  identical  waiting  times  in  both  systems.   The  waiting 

time  in  the  j  5-quantum  queues  in  the  Coffman-Kleinrock  system  is  greater 

th 
than  that  in  the  i   queue  in  the  modified  system  by  a  factor  that  depends 

on  the  average  number  of  new  arrivals  to  that  set  of  queues.  We  define 

E(T. )  to  be  this  extra  expected  waiting  time. 

The  average  number  of  arrivals  must  be  based  on  W.  +  (k.  -  l)  5 

since  new  arrivals  can  seize  5-quantums  of  service  until  the  tagged  job 

begins  its  last  5  of  processing.   The  average  arrival  rate  to  the  i 

queue,  \.,    is  determined  by  the  following  consideration.   A  job  arrives 

th 
for  service  at  the  i   queue  only  if  it  requires  more  than  £.         seconds 

of  processing.  We  recall  that  B(t)  =  1  -  e    is  the  service  time 

distribution,  where  B(t)  represents  the  probability  that  the  service 

time  t  is  less  than  or  equal  to  some  number.   The  inverse  is  formed  by 

solving  for  t: 

t  =  - ( l/p )  In  [1  -  B(t)J. 

The  inverse  form  can  be  used  to  calculate  the  probability  that  t  is  greater 
than  some  particular  L    ,  .   If  we  call  this  probability  p  >  «    than 

As  an  example  of  this  process,  we  consider  I .    -  =•  7  and  seek  to  discover 

the  probability  that  1  >   7.  For  t  <  7  and  ju  =  — ,  B(t)  =  .37  so  that 

p    _  =  .63  and  the  arrival  rate  to  1  is  given  by  X.  =  .63  X- 
T  ^   1  1  1 
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Having  determined  the  arrival  rate,  X  ,  and  the  interval  in 
which  these  arrivals  take  place,  W.  +  (k.  -  1)5,  the  average  number  of 
arrivals  is  calculated  as  the  product  of  these  two  quantities.  The 
time  by  which  the  tagged  job  will  be.  delayed  is  the  product  of  this 
average  number  of  arrivals  and  their  average  service  time  requirement, 
I  -1(t)»   Only  service  times  strictly  less  than  or  exactly  equal  to 

1 

(k.-l.)o  are  significant  here  since  (k.-l)s  is  the  maximum  service  time, 
a  job  arriving  to  this  queue  will  receive  before  the  tagged  job  completes 
its  service  requirements.  The  expression  for  the  average  service  time, 
therefore,  is  given  by 


V1<T)  =  Jo*1*1  ****X  ta(1  -/(kl-i)6^Xdx)+(ki-1)5J(k.-i)5ue*Xta 

(8)     j 

where  the  first  term  represents  the  average  service  time  of  jobs  requiring 
less  than  (k.-l)  units  of  service  time  and  the  second  term  represents  the 
average  service  time  of  jobs  requiring  (k.-l)  or  more  units  of  service 
time.   Performing  the  integration  and  simplifying,  equation  7  becomes 

-(k.-l)oju  -,   -2(k  -l)Sju 

lt   -,(t)  =  -  -  1  e    x  +  [(k.-l)s  +±].e    x  (9) 

k.  -1      u  jLt  1        jU 


Now  since  the  waiting  time  interval  is  lengthened  by  the  added  arrivals 
acquiring  their  service  quantums,  an  infinite  summation  of  these  service 
quanta  is  required  so  that  E(T.),  the  added  expected  waiting  time  in  the 
Coffman-Kleinrock  model,  becomes 
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E(T.)  =  [¥.  +  (k.  -1)5J  [  Z  (X.  t     -1(t))J]  (10) 

3=1      1 


where 


and 


1 


Z  (p.)  =  -5 so  that 

0=1  1 

p.  [W.  +  (k.  -1)5] 

e<v-    \.P: —  •  (ii) 


th 
So  in  the  modified  Coffman-Kleinrock  system,  for  each  i   priority  queue, 

i  <  m  and  k.  >  1,  E(T.)  must  be  subtracted  from  the  Coffman-Kleinrock 

m-1 
response  time  equations,  or  the  term  -  Z  E(T.)  must  be  added.  These 

i=l    x 

terms  may  be  considered  independently  for  each  f±.    subset  because  even 
though  a  job  may  wait  longer  to  complete  service  in  the  j  5 -quantum  queues 
of  the  Coffman-Kleinrock  system  than  in  the  corresponding  i   queue  in 
the  modified  system,  the  relative  ordering  of  the  jobs  does  not  change 

from  one  system  to  the  other.  That  is,  when  the  job  arrives  at  either 

st        st 
the  /\.    ,  '  or  (i+l)  '   queue  it  sees  the  same  queue  configuration  in 

either  system. 

We  now  consider  the  £         or  m   pass,  where  the  waiting  time 

m         B       ' 

is  not  the  same  for  the  two  models  if  q  >  5 .   In  the  fixed  quantum  system, 
a  job  continuously  receives  small  bursts  of  service  up  to  and  including 
its  k    burst,  waiting  only  for  other  jobs  in  the  system  to  receive  their 
same  bursts  up  to  m.  But  in  the  variable  sized  quantum  system,  a  job 
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that  is  queued  for  service  at  the  m   priority  level  must  wait  until  the 
jobs  ahead  of  it  receive  their  total  quantum  of  service,  up  to  the  maximum 
alloted  at  that  level. 

Since  an  arrival  requires  service  at  the  £         priority  queue 
(which  consists  of  £     -  £     .  5 -service  queues)  or  the  m   priority 


queue  only  if  it  requires  in  excess  of  £     _  seconds  of  service,  the 

,  th 

m   *ueue'  h 


average  arrival  rate  to  the  £         queue,  \ff   ,  is  given  by 


m 


m  m-1 


£     is  the  time  our  tagged  job  must  wait  for  service,  then 

th 
the  expected  average  number  of  arrivals  to  the  £         queue  must  be  based 

on  W„  +  £     ,  5  since  the  tagged  job  receives  £     n5  seconds  of  service 
£  m-1  m-1 

m  , , 

before  reaching  the  £         queue.  Therefore,  the  expected  average  number 

th 
of  arrivals  to  the  £         queue  prior  to  the  tagged  job  would  be 


m    m 

The  average  service  time  distribution  for  the  queue  arrivals  would  differ 
depending  on  whether  the  job  was  serviced  in  system  one,  the  Coffman- 
Kleinrock  system,  or  in  system  two,  the  modified  Coffman-Kleinrock  system. 

In  system  one,  £     -  £     _  queues  remain  through  which  the  tagged 

'      m    m-1 

job  must  pass  before  completion.  Each  of  the  arrivals  to  the  (£         +  1) 

queue  must  have  remaining  quanta  of  service  of  which  |*    „        (t)  is  the 

m    m-1 
average  amount.  The  expected  time  to  process  all  jobs  before  the  tagged 

job  in  the  £     -  £     n  interval  is  therefore 
m    m-1 
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mm  m    m-1 


This  -waiting  time  is  already  included  in  the  Coffman-Kleinrock  equations. 
In  system  two,  each  arrival  to  the  m   queue  will  have  remaining 
a  quanum  of  service  of  which  £   (t)  is  the  average  amount.   The  expected 
time  to  process  all  jobs  before  the  tagged  job  is  therefore 


\   [W    +  L   J  L  (t)   • 
m    m 


X.   X.     m-I   ^ 


The  term  that  must  be  added  to  the  Coffman-Kleinrock  equations  number  (l) 
and  (2)  therefore,  to  make  the  results  valid  for  the  variable  quantum 
size  model  is 


^   tW^   +  L    J  [L  (t)    i„  „  (t)J 


Vl    V  ;    X-  Vi 


If  q  =  I  -i  _,  then  the  term  is  zero. 
Tn    m    m-1 

Thus,  with  the  two  modifications  detailed  above,  the  modified 

Coffman-Kleinrock  model  becomes  directly  applicable  to  time-sharing 

systems  of  the  type  represented  by  the  general  time-sharing  model  of 

Figure  2.2. 

h.2 .   A  Simulation  Model  for  Time- Sharing  Systems 

A  GPSS  simulation  model  of  a  time -sharing  system  with  a 
scheduling  discipline  identical  to  that  specified  for  the  analytical 
model  was  also  developed.  A  flowchart  of  this  model  as  it  simulates 
the  MIT-MULTICS  time-sharing  system  is  presented  in  Figures  U.2(a) 
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Figure   k.2(&). 


Simulation  of  MIT-MULTICS  Time -Sharing  Scheduler 
(Generation  of  Tagged  Jobs) 
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Figure  U.2(t>)  (continued). 


Simulation  of  MIT -MULT ICS  Time -Sharing  Scheduler 
(Generation  of  Jobstream) 
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Figure  U.2(c)  (continued). 


Simulation  of  MIT-MULTICS  Time -Sharing  Scheduler 
(Scheduling  Discipline) 
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Figure   4.2(d)    (continued).        Simulation  of  MIT-MULTICS  Time-Sharing   Scheduler 

(Job  Parameter  Updating) 
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Figure   k.2(d)    (continued).        Simulation  of  MIT-MULTICS  Time-Sharing  Scheduler 

(Job  Parameter  Updating) 
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Figure   k.2(e)    (continued).        Simulation  of  MIT-MULTICS  Time-Sharing   Scheduler 

(Run  Time  Control) 
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through  k.2(e),   with  the  chart  symbols  identifical  to  those  of  Schriber's 
in  his  General  Purpose  Simulation  System/360:   Introductory  Concepts 
and  Case  Studies  [SCHTl].  Figures  if .2(b)  through  U.2(d)  illustrate  the 
heart  of  the  simulator  as  it  generates  jobs  with  exponentially  distributed 
interarrival  rates,  assigns  service  times  exponentially  distributed  about 
some  mean,  and  services  the  jobs  according  to  the  scheduling  discipline 
described  for  MIT-MULTICS  in  section  2.2 .l.k.      The  simulator  generates 
tagged  jobs  for  data  collection  purposes  and  this  process  is  diagrammed 
in  Figure  U.2(a).  Figure  U.2(e)  shows  the  control  module  for  desired 
running  time  of  the  simulator. 

4.3.   Analysis  of  Model  Predictions 

The  analytic  and  simulation  models  were  developed  to  generalize 
the  predictive  capability  of  the  statistical  response  time  models.  The 
conceptualization  and  definition  of  the  analytical  and  simulation  models 
were  derived  from  the  Generalized  Time -sharing  Scheduling  diagram  shown 
in  an  earlier  chapter  in  Figure  2.2.  As  a  result  of  the  generalized 
conceptualization  of  the  models,  they  can  be  expected  to  most  closely 
describe  those  time -sharing  systems  which  are  most  similar  to  the 
generalization.   Since  the  AMES-T3S  system  scheduler  depends  on  core 
usage  behavior  rather  than  processor  usage,  the  analytical  and  simulation 
models  do  not  apply  to  that  system.   They  also  are  not  applicable  to 
the  UCSD-CANDE  system  since  the  models  allow  a  variable,  but  fixed, 
service  slice  at  each  priority  level  and  time-slices  are  dynamically 
awarded  in  CANDE's  priority  queues  as  a  function  of  parameters  generated 
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during  past  processor  usage.  The  models,  therefore,  have  been  particularize! 
to  the  three  remaining  time-sharing  systems — BBN-TENEX,  CCN-TSO  and 
MIT-MULTIC  S . 

1+.3.1.   Individual  System  Results 

BBN-TENEX  Models — Both  the  analytic  and  simulation  models 
were  developed  for  the  BBN-TENEX  system.  The  results  of  these  model 
predictions  are  plotted  in  Figure  k.3>     The  analytic  model  is  valid  only 
for  values  of  system  utilization,  p,  less  than  one,  so  that  since  the  BBN 
system  saturates  under  relatively  light  loads,  predictions  from  the 
analytic  model  are  possible  only  for  load  levels  1-6 . 

The  technique  used  to  tune  the  analytic  and  simulation  models 
to  closely  represent  the  TENEX  system  was,  after  setting  up  the 
appropriate  priority  queues  and  assigning  their  associated  time-slices, 
to  adjust  the  average  service  time  and  average  interarrival  rate 
parameters  so  that  the  analytical  model  prediction  for  the  number  crunching 
benchmark  job  was  as  similar  to  the  statistical  model  prediction  as  seemed 
feasible.  For  TENEX,  the  best  results  were  obtained  for  the  average 
service  time  equal  to  twenty  seconds.   The  average  interarrival  rates 
were  associated  with  previously  defined  TENEX  load  levels  as  indicated 
in  Table  k.l. 

As  can  be  observed  from  Figure  ^-.3,  both  the  analytic  and 
simulation  model  plots  yield  satisfactorily  close  fits  to  the  statistical 
model  plot.  They  are  also  well  within  the  95  percent  confidence  interval 
of  the  statistical  model. 
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Figure   k.3.        Model  Comparison   -  BBN-TENEX 
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Table  k.l.        Analytical  Model  Parameters 


Load 
Level 

Associated  Average  Interarrival  Rate 

BBN-TENEX 

CCN-TSO 

MIT-MULTICS 

1 

29 

60 

2 

25 

60 

3 

2k 

30 

30 

3-5 

23 

1+ 

20 

k.5 

22 

5    | 

20 

15 

6     j    21 

15 

12 

7 

12 

10 

8 

8 

8.5 

10 

9 

9-5 

8 

10 

19 

CCN-TSO  Models --Since  the  run  times  to  obtain  comparable  response 
time  results  from  the  analytical  and  simulation  models  is  greater  by 
a  factor  of  approximately  ten  for  the  simulation  model,  and  since  the 
analytical  model  results  correspond  so  closely  with  those  of  the  statistical 
model  for  the  CCN-TSO  system,  only  the  analytical  model  was  developed  in 
this  case.  Model  comparison  results  are  presented  in  Figure  k.k.     The 
average  service  time  for  jobs  in  this  system  was  tuned  to  seven  seconds 
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Figure   h.k.       Model  Comparison   -  CCN-TSO 
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and  the  average  interarrival  rates  are  associated  "with  load  levels  as 
shown  in  Table  ^.1.   The  95  percent  confidence  interval  is  a  relatively 
•wide  one  for  the  TSO  statistical  model  and  the  analytical  model  results 
fall  well  within  this  interval  for  all  load  levels. 

MIT-MULTICS  Models—Because  the  statistical  response  time  curve 
for  the  MIT-MULTICS  system  was  most  like  that  usually  associated  with 
time-sharing  systems,  this  system  was  initially  used  to  develop  and 
validate  both  the  analytical  and  simulation  models.  The  average  service 
time  was  tuned  to  seven  seconds  and  the  average  interarrival  rate/load 
level  association  can  again  be  found  in  Table  k.l.     The  models  plotted 
in  Figure  U.5  verify  that  indeed  the  analytical  and  simulation  models 
yield  very  nearly  identical  results  for  this  well-behaved  MULTICS  system 
and  that  for  all  but  approximately  one  load  length  (6.5-7*5)  the 
analytical  and  simulation  models  fall  within  the  95  percent  confidence 
interval  of  the  statistical  model.   This  confidence  interval  is  relatively 
tight  and  it  is  only  near  system  saturation  that  the  two  models  tend  to 
move  unacceptably  far  away  from  the  statistical  model  results.   This 
discrepancy  is  easily  explained  by  the  fact  that  the  MIT-MULTICS  system 
deviates  from  the  generalized  time-sharing  scheduler  model  in  that  it 
has  two  processors  rather  than  one.  The  analytical  and  simulation  models, 
therefore,  would  approach  saturation  more  quickly  than  the  statistical 
model  which  represents  actual  two -process or  system  data. 

U.3.2.   Success  of  Model  Generalization 

The  striking  success  with  which  the  analytical  and  simulation 
models  were  able  to  describe  system  behavior  for  the  set  of  time-sharing 
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Figure  k.5.       Model  Comparison  -  MIT-MULTICS 
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systems  whose  scheduling  discipline  can  be  conceptualized  by  the 
generalized  time-sharing  model  (Figure  2.2),  indicates  that  these  models 
can  be  used  for  more  extended  predictions  of  system  behavior.  Having 
been  validated  against  the  statistical  models  based  on  actual  system 
measurements,  the  analytic  and  simulation  models  can  now  be  utilized 
to  predict  system  behavior  for  jobs  with  characteristics  similar  to  the 
number-crunching  benchmark,  but  with  variable  processing  time  requirements. 
One  example  of  a  set  of  such  predictions  is  shown  in  Figure  k.6.      In 
this  case,  the  MTT-MULTICS  simulation  model  was  used  to  predict  response 
times  for  jobs  requiring  various  amounts  of  processing  time,  t,  as  the 
load  level  increases. 

The  relative  ease  with  which  the  analytical  and  simulation 
models  could  be  tuned  to  reproduce  the  statistical  model  results  for 
the  number  crunching  benchmark  job  indicates  that  this  process  could  be 
easily  repeated  for  the  other  benchmark  jobs  on  the  appropriate  systems 
(CCN-TSO  and  MIT-MULTICS,  since  only  the  number  crunching  benchmark  was 
run  on  BBN-TE1JEX)  . 

Thus,  the  goal  of  finding  a  single  model  capable  of  describing 
and  predicting  response  times  for  time-sharing  systems  has  been  accomplished 
in  the  case  where  the  time-sharing  scheduling  discipline  depends  on 
quanta  fixed  at  each  priority  level,  but  variable  across  priority  levels, 
and  on  the  past  processing  history  of  the  job  to  be  serviced.  Although 
both  the  analytical  and  simulation  models  successfully  meet  this  goal,  the 
analytical  model  produces  its  results  in  approximately  one-tenth  the  time 
as  the  simulation  model  and  it,  therefore,  may  be  the  most  practical  model 
for  actual  use  in  cases  where  response  times  for  load  levels  beyond  the 
saturation  point  of  the  system  are  not  required. 
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Figure  k.6.       Generalized  Simulation  Model  Results 
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h .k.       Consideration  of  Network  Queueing  Delays 

The  response  time  measurements  taken  on  individual  systems 
of  the  ARPA  network  for  this  study  did  not  distinguish  the  delay  due 
to  network  transmission  and  queueing  from  the  delay  due  to  individual 
system  busyness.  This  dichotomy  of  delays  was  considered  to  "be 
insignificant  at  the  time  the  measurements  were  taken  since  network 
traffic  was  generally  light  and  only  a  short  run  command  as  opposed 
to  the  total  "benchmark  program  was  transmitted.  Network  transmission 
and  queueing  delays  were  estimated  at  their  maximum  to  be  on  the  order 
of  .1  second  in  either  direction  and  as  such  did  not  contribute  measurably 
to  the  individual  system  response  time  delays. 

The  question  now  arises  as  to  the  effect  of  network  transmission 
and  queueing  delays  on  comparative  system  response  times,  given  that  in  the 
future  network  traffic  increases  by  a  significant  amount.  G.  D.  Cole 
in  his  extensive  measurement  work  on  the  ARPA  network  [C0L71]  develops 
expressions  for  the  serial  transmission  delays  and  the  queueing  component 
delays  of  ARPA  network  messages.  The  network  delay  time  as  calculated 
using  Cole's  expressions  can  be  added  to  the  delay  times  generated  by 
the  individual  system  response  time  models  to  form  a  composite  response 
time  model . 

The  delay  caused  by  physically  sending  a  message  on  the  ARPA 
network  from  one  node  to  any  other  node  has  two  components- -the  service 
times  at  each  IMP  to  store  and  forward  the  message  and  the  actual  serial 
transmission  delay.  For  this  experiment,  the  run  commands  were  either 
one  or  two  word  messages  and  their  expected  store -and-forward  service 
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times  were  3«^  and  k.O   msec,  respectively  [C0L71,  P«  131]  at  each  IMP. 
The  propagation  delay  is  about  10  ju  sec/mile,  resulting  in  a  cross-country 
delay  of  approximately  30  msec.  Assuming  that  the  University  of  Illinois 
node  is  the  one  from  which  all  messages  originate,  and  assuming  that 
routing  occurs  in  an  environment  in  which  all  nodes  are  connected  as 
shown  in  Figure  2.1,  then  transmission  delays  for  the  run  command 
message  can  be  estimated.  Table  k.2   summarizes  these  calculations. 
Inspection  of  the  table  reveals  that  even  the  longest  transmission  delay 
of  .05  seconds  to  UCSD  is  insignificant  when  response  time  measurements 
are  recorded  in  seconds. 

Table  k.2.        Transmission  Times  from  Illinois  to  Experimental  Sites 


Destina- 
tion 

■  ■    ■-....        ■  —  —  — 

No.  of  store  & 
forward  trans- 
missions 

Expected  Ser- 
vice time  at 
each  IMP  (msec) 

Total  expected 
IMP  service 
time  (msec) 

Propaga- 
tion delay 
(msec) 

Total 
trans- 
mission 
time 
(msec) 

AMES 

1+ 

k.O 

16.0 

20.0 

36.0 

BBN 

3 

3.k 

10.2 

10.0 

20.2 

CCN 

7 

3.h 

22.8 

20.0 

U2.8 

MIT 

1 

S.h 

3-k 

10.0 

13.4 

UCSD 

8 

k.O 

32.0 

20.0 

52.0 
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The  queueing  component  of  message  delay  may  be  a  significant 
addition  to  individual  response  time,  however,  if  the  ARPA  network 
becomes  congested.   Cole's  expression  for  the  expected  message  queueing 
delay  [C0L71,  P-  13*+]  is 


A/2  [(Xj   +  X  x  +  (x  )  ] 
m'    v  a7     a  m   v  nr 
w  = 


[1  - ■  \  X  ]  [1  -\   (X  +  x  )J 

ma       mv  a    m' 


> 


with  variables  defined  in  the  following  way: 

"\  -  arrival  rate  of  messages  into  the  network. 

m 

X  -  service  time  for  an  ACK  or  acknowledgment . 
Each  message  is  answered  by  a  request  for 
next  message,  EFNM,  which  must  in  turn  be 
answered  by  an  ACK.   Therefore,  a  number 
of  ACKs  will  be  in  contention  for  the  service 
facility  along  with  the  messages  themselves, 
and  in  heavy  traffic  conditions,  will  effectively 
increase  each  service  time  by  the  3*0  msec  that 
is  required  to  transmit  an  ACK. 

x  -  average  message  service  time, 
m       D       ° 

Using  the  average  message  service  times  for  the  various  destinations 

listed  in  Table  k  .2   and  allowing  "\  to  increase,  the  effect  of  network 

D  m  ' 

congestion  on  comparative  response  times  can  now  be  investigated. 

Cole  defines  an  alternative  system  descriptor  to  "K     called 

m 

T  )  where  T  is  the  transmission  attempt  interval  of  the  time  between 
a       a  ^ 

"attempts"  at  transmission,  since  no  transmission  will  occur  on  a  link 

which  is  waiting  for  a  RFNM  (Request  for  Next  Message)  return.  Further, 

if  N  is  the  number  of  active  nodes  or  "generators"  of  transmissions,  then 

"\  =  n/'T  •  Assuming  that  3^  nodes  are  active  simultaneously  on  the  ARPA 
m    /  a         &  J 
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network,  then  the  value  of  T  for  which  the  expected  queueing  delay 

a 

approaches  infinity  for  transmission  of  a  message  of  a  particular 
node  becomes  a  meaningful  basis  of  comparison  between  nodes. 

For  example,  if  messages  are  being  transmitted  from  the 
University  of  Illinois  node  to  one  of  the  five  systems  investigated  in 
this  study,  then  network  queueing  delays  to  each  of  these  systems 
approaches  infinity  for  the  value  of  T  listed  in  Table  k*3>     From 

81 

the  table,  it  can  be  observed  that  while  queueing  delays  to  UCSD  from 

the  University  of  Illinois  approach  infinity  when  the  network  transmission 

attempt  interval  is  slightly  higher  than  1  second,  transmissions  to  MIT 

are  not  adversely  affected  until  T  is  close  to  .2  seconds.  Not  evident 

a 

from  the  table  information  is  the  fact  that  network  transmission  and 
service  speeds  of  the  order  of  milliseconds  cause  the  queueing  delays 
to  be  sensitive  to  changes  in  transmission  attempt  intervals  of  the  order 
of  milliseconds.  Queueing  delays  to  UCSD,  for  instance,  do  not  rise 
above  one  second  until  T  =  1.215  seconds.  From  that  point,  congestion 

EL 

quickly  increases   so  that  at  T     =  1-195   seconds   saturation  occurs. 

Likewise,    at  MIT  queueing  delays  rise  above   one    second  only  at  T     =    .218 

seconds  and   saturation  occurs  at  T     =    .217   seconds. 

a 
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Table  k.3-        Infinite  Network  Delays  from  U.  of  I.  Node 


T  value  at  which  network 
a 

queueing  delays  approach  «> 

(msec. ) 

AMES-TSS 

61+5 

BBN-TENEX 

1+U8 

CCN-TSO 

878 

MIT -MULTI CS 

217 

UCSD-CANDE 

1190 

In  cases  where  one  system  responds  faster  than  another, 

then,  but  where  network  traffic  causes  larger  queueing  delays  to  the 

faster  system  for  a  given  (low)  value  of  T  ,  then  the  network  queueing 

a 

delay  becomes  a  significant  consideration  in  system  comparison  during 
periods  of  heavy  network  usage  and  must  be  included  as  a  part  of  the 
predictive  response  time  models. 


113 

5-   A  DYNAMIC  RESPONSE  TIME  MONITOR 

The  major  purpose  of  this  research  was  to  investigate 
methodologies  and  models  which  could  be  utilized  to  develop  a  dynamic 
response  time  monitor  for  ARPA  network  users.  The  monitor  is  to  supply- 
on-line,  real-time  information  about  the  level  of  busyness  or  load  level 
of  each  computing  node  of  the  network  and  also  to  supply  comparative 
response  time  data  for  particular  computing  applications  for  each  of  these 
nodes.  Research  results  indicate  current  feasible  features  of  such  a 
monitor  and  also  suggest  additional  features  that  should  be  implemented. 

5-1.   Currently  Feasible  Monitor  Features 

Evidence  is  available  from  the  investigation  of  response  time 
at  the  five  computing  nodes  included  in  this  study  to  suggest  three 
immediately  implementable  monitor  features.  The  first  of  these  is  a 
table  of  load  levels  at  each  node  by  time  of  day  and  day  of  the  week. 
If  ten  load  levels  are  defined  across  the  observable  load  range  for  all 
computing  nodes,  as  was  described  in  section  2.2.3.,  then  users  could 
gain  a  snap-shot  overview  of  relative  busy  times  at  any  one  node.  This 
type  of  information  might  influence  a  decision  about  when  to  do  work  on 
a  particular  system.  An  example  of  a  section  of  such  a  table  has  been 
compiled  for  the  AMES-TSS  system  and  it  is  presented  in  Table  5.1. 
The  data  in  the  table  approximates  system  behavior  during  May  and  June  of 
197^-  For  user  convenience,  the  time  of  day  on  these  tables  should  be 
translated  to  the  time  zone  (EDT,  EST,  PST,  etc.)  from  which  an  inquiry 


Hk 

is  made.   Times  in  the  AMES-TSS  table  correspond  to  the  time  framework 
of  a  user  at  the  University  of  Illinois  node. 


Table  5.1.    Load  Levels  at  AMES-TSS 


Sunday 

Monday  -  Friday 

1 
Saturday 

1-2 

-8  AM 

8-9  AM 

1  -  2 

1  "  2 

9-10AM 

1  -  3 

r       1-2 

10AM- 2  PM 

5  -  8 

2-3  PM 

5  -  6 

3-7  PM 

7  -  9 

7-8  PM 

3  -  k 

8-9  PM 

1-2 

3  -  k 

9PM- 

A  second  feature  to  be  included  in  a  dynamic  response  time 
monitor  is  a  descriptive  text  explaining  relevant  local  factors  effecting 
system  response  times  at  each  node.   In  some  cases,  such  explanations 
are  buried  in  "HELP"  files  associated  with  a  particular  time-sharing 
system.  Also,  the  Network  Information  Center  of  the  ARPA  network 
provides  a  brief  explanation  of  local  conditions  in  its  NIC  publication 
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No.  18666.   These  courses  of  local  load  information  are  either  incomplete 

(not  available  at  every  node)  or  out  of  date  and  are  not  necessarily 

easily  accessible  to  all  potential  users  of  a  particular  computing  node. 

For  example,  the  NIC  "Service  Schedule"  description  for  the  AMES-TSS 

system  published  in  August  of  1973  reads  as  follows: 

AMES-67  is  available  2h   hours  per  day  but  severe 
loading  generally  restricts  access  from  0800  to 
1700  PST.   The  weekend  schedule  varies.   Typical 
Load  is  30-50  users  (including  batch).   The 
maximum  number  of  users  is  regulated  dynamically 
by  loading.  Network  users  are  not  regulated 
separately.   [ANR73b] 

This  description  is  accurate,  but  omits  information  that  may  be  useful, 

or  at  least  of  interest,  to  a  network  user.  For  instance,  the  AMES 

system  has  developed  a  "Resource  Allocation  Scheme"  which  attempts  to 

guarantee  a  certain  level  of  service  to  authorized  priority  users  at 

various  times  throughout  the  day  (one  group  has  priority  from  8-10AM, 

another  from  lOAM-noon  and  so  on) .  Because  of  this,  the  load  measure, 

PI,  rarely  goes  below  .250.  When  the  guaranteed  level  of  service  for  a 

particular  priority  group  is  being  threatened  by  a  heavy  load,  then 

system  access  is  curtailed  for  all  non-priority  users,  including  the 

non-priority  network  user.  A  further  point  of  interest  about  the  AMES 

system  is  that  the  local  user  group  works  a  fairly  regular  8AM- 5PM 

schedule,  taking  the  noon  hour  for  lunch.  Thus,  the  machine  is  lightly 

loaded  during  noon-lFM  PST. 

In  addition  to  the  load  level  tables  and  load  descriptive  text, 

the  dynamic  response  time  monitor  must  include  an  inquiry  feature  by 

which  a  network  user  can  obtain  actual  current  comparative  response 
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time  data  for  a  job  to  be  processed.  The  inquiry  feature  would  be  made 
up  of  two  interactive  modules — the  user  interface  and  the  predictive 
mechanism. 

The  user  interface  would  require  user  input  consisting  of  the 
set  of  nodes  at  which  response  time  is  to  be  calculated  and  the  CPU  and 
i/O  processing  characteristics  of  the  job  to  be  submitted.   The  output 
to  this  user  inquiry  would  consist  of  a  list  of  expected  response  times 
at  each  of  the  indicated  nodes,  including  the  current  load  level  at  each 
node.   This  data  for  output  would  be  generated  by  the  predictive  module 
of  the  inquiry  feature.   Prediction,  of  course,  is  at  the  heart  of  the 
dynamic  response  time  monitor  and  the  feasibility  of  the  predictive 
feature  has  been  verified  by  this  research. 

For  each  of  the  five  different  time -sharing  systems  investigated 
in  this  study,  it  was  possible  to  develop  a  statistical  model  in  all 
cases,  and  an  analytic  and  simulation  model  in  most  cases,  to  describe 
and  predict  the  response  time  behavior  of  that  system  as  it  processed 
a  limited  set  of  benchmark  jobs.   The  initial  indication  from  the  analytic 
and  simulation  models  is  that  they  can  be  easily  extended  to  predict 
response  times  for  more  general  classes  of  jobs  than  the  three  benchmark 
applications.  Moreover,  the  systems  themselves  represented  a  wide  range 
of  time-sharing  scheduling  implementations,  including  the  unique  AMES-TSS 
table  driven,  memory-usage  dominated  system.   Successful  description  and 
prediction  of  behavior  for  this  wide  variety  of  time-sharing  schedulers 
suggests  equal  success  with  other  time-sharing  systems  whose  scheduling 
is  any  variation  on  the  general  time-sharing  scheduling  algorithm  as 
described  in  section  2.1. 
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Results  have  been  obtained  for  the  network  transmission  and 
queueing  delays  which  add  significantly  to  response  time  when  the 
network  itself  becomes  congested.   Should  network  usage  become  such  that 
the  network  approaches  a  saturated  state,  then  these  queueing  delays 
would  have  to  be  added  to  the  individual  system  delay.  Although 
calculations  made  for  this  research  were  done  only  for  very  short  messages, 
the  same  Cole  expression  can  be  used  when  input  or  output  message  length 
is  expected  to  be  greater  than  that  able  to  be  transmitted  in  one  message 
packet.   Thus,  an  analytic  model,  able  to  be  used  from  any  network  node, 
is  available  for  prediction  of  this  component  of  the  response  time. 

5 .2 .   Additional  Desirable  Monitor  Features 

Beside  the  features  which  have  already  proven  to  be  immediately 
implementable  components  of  a  dynamic  response  time  monitor,  there  exist 
other  desirable  monitor  features  which  would  make  utilization  of  network 
resources  easier  for  the  user.  Chief  among  these  is  comparative  cost 
information.   Some  preliminary  work  done  by  Peter  Alsberg  at  the  University 
of  Illinois  Center  for  Advanced  Computation  illustrates  the  difficulties 
encountered  in  collecting  charging  algorithm  data  for  individual  systems 
on  the  ARPA  network.   Some  systems  have  free  accounts  for  network  users 
and  some  heavily  subsidized  systems  use  charging  algorithms  that  do  not 
reflect  their  actual  expenses.  Further,  information  is  needed  on  network 
routing  expenses  since  if  charging  where  done  on  a  node  by  node  basis, 
then  some  systems  which  offer  a  cost  advantage  as  individual  entities 
may  lose  that  advantage  due  to  extensive  job  routing  requirements. 
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Given  both  comparative  response  time  data  and  comparative  cost 
data,  the  dynamic  monitor  could  be  extended  to  appear  to  the  user  as  a 
dealer  in  network  services.   The  monitor  would  be  enabled  to  indicate  the 
fastest  response  time  possible  at  the  highest  cost  a  user  is  willing 
to  pay.   Thus,  the  monitor  can  provide  complete  time  -vs-  cost  data  while 
not  usurping  the  users'  power  to  finally  decide  where  to  run  a  job. 
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6.    CONCLUSIONS 

This  research  has  shown  that  it  is  feasible  to  develop  a 
response  time  monitor  for  use  in  a  network  computing  system  that  is 
capable  of  providing  comparative  response  time  information  for  users 
with  various  computing  applications  to  process.   System  response  behavior 
was  measured  and  modeled  using  statistical  techniques  as  well  as  analytical 
and  simulation  techniques.  The  effect  of  network  traffic  on  response 
times  were  also  considered. 

Analysis  of  measurements  on  individual  time -sharing  systems 
revealed  that  it  is,  in  fact,  possible  to  describe  and  predict  response 
time  for  these  systems  using  linear  and/or  nonlinear  regression  techniques. 
The  need  for  more  uniform  measures  of  "response  time"  and  system  "busyness" 
was  particularly  evident  in  this  phase  of  the  investigation.  While 
response  time  could  be  satisfactorily  defined  in  a  uniform,  consistent, 
easily  measurable  way,  a  uniform  measure  of  load  level  or  busyness  of  a 
system  was  more  elusive.  A  more  satisfactory  solution  to  the  busyness 
dilemna  would  have  been  possible  if  all  systems  could  have  been  observed 
with  busyness  ranging  from  no  users  to  system  saturation.  Although  the 
lower  bound  was  observable  on  all  systems,  some  of  the  nodes  under 
investigation  did  not  approach  saturation  during  the  measurement  phase 
of  the  research. 

Having  decided  on  a  definition  of  load  level  that  was  uniform 
and  consistent  across  all  systems,  but  perhaps  not  intuitively  pleasing, 
comparison  of  response  times  of  time-sharing  systems  as  they  processed 
given  benchmark  jobs  was  possible.   Systems  were  able  to  be  ranked  in 
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order  of  fastest  to  slowest  response  times  for  a  relatively  long 
(approximately  4  5  seconds  of  processing  time)  CPU  bound  job,  for  a  short 
(approximately  3  seconds  processing  time)  CPU  bound  job  and  for  an  i/o 
bound  job. 

This  comparative  capability  was  expanded  from  these  three 
specific  benchmark  jobs  to  a  more  general  class  of  jobs  through  the 
development  of  a  single  analytical  and  a  single  simulation  model.   The 
models  were  developed  to  describe  and  predict  the  response  time  behavior 
of  the  time-sharing  systems  involved  in  the  study  and  were  found  to  be 
valid  system  representations  in  three  of  the  five  systems  investigated. 

The  effects  of  increased  network  traffic  were  also  studied 
and  an  expression  was  found  to  predict  this  component  of  response  time 
if  and  when  it  becomes  significant  (adds  delay  on  the  order  of  magnitude 
of  seconds  to  the  response  time  of  any  individual  system) .  Currently 
on  the  APPA  network,  traffic  is  light  and  delays  due  to  network  congestion 
were  not  significant  in  the  response  time  measurements. 

The  successful  results  of  the  various  areas  of  investigation 
described  above  led  to  the  postulation  of  the  feasibility  of  a  dynamic 
response  time  monitor  that  users  could  query  to  obtain  current  on-line 
comparative  response  time  data  for  a  particular  computing  application  run 
on  one  of  a  set  of  network  time- sharing  facilities.  The  contents  and 
structure  of  such  a  monitor  were  discussed. 

6.1.    Implications  for  Future  Network  Development 

User  oriented  network  research  requires  a  commitment  to  the 
investigation  and  development  of  tools  that  go  beyond  mere  reliability 
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goals.   If,  indeed,  the  ultimate  aim  of  a  computing  network  is  resource 
sharing,  then  the  human  component  as  well  as  the  technical  components 
of  networking  must  be  fully  investigated  to  achieve  this  goal.   This 
research,  a  first  step  toward  assisting  the  user  in  participating  in  the 
vast  store  of  resources  available  on  a  network,  suggests  that  a  firm 
commitment  on  the  part  of  node  managers  must  be  made  (or  required)  to 
maintain  and  improve  such  assistance. 

The  most  pressing  commitment  on  the  part  of  node  managers, 
needed  to  make  more  effective  the  implementation  of  the  dynamic  response 
time  monitor  discussed  in  section  5.,  is  the  investigation  of  and 
agreement  upon  uniform  response  time  and  load  measures.  Two  of  the  five 
systems  studied  (BBN-TENEX  and  UCSD-CANDE)  already  automatically  generate 
a  consistent  response  time  measure,  as  defined  in  section  2.2.U.,  when  a 
job  is  run.  This  information  is  easily  obtainable  using  a  system  clock 
and  could  be  provided  by  other  network  systems  with  very  likely  only 
a  minimum  of  effort.  An  acceptable  load  measure  may  be  more  difficult, 
but  not  impossible,  to  implement  on  all  network  systems.   The  BBN-TENEX 
"load  average"  measure  which  is  a  ratio  of  jobs  on  the  ready  queue  to  jobs 
on  the  run  queue  has  proved  to  yield  the  least  variation  when  statistical 
analysis  of  response  time  data  is  performed.   It  is  a  highly  dynamic 
measure  and  a  meaningful  one  in  terms  of  system  loading  and  the  users ' 
conception  of  system  busyness .   The  "load  average"  measure  is,  therefore, 
a  prime  candidate  for  a  uniform  measure  of  system  load  on  all  network 
systems. 
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A  second  commitment  required  of  node  managers  is  to  the  develop- 
ment and  maintenance  of  descriptive  and  predictive  response  time  models 
for  their  respective  nodes.   This  research  has  illustrated  that  such  models 
are  possible  to  generate  and  can  be  effectively  used.  But  a  considerable 
amount  of  work  is  involved  in  fine  tuning  these  models  so  that  they  are 
accurate  for  various  classes  of  input  jobs  (CPU  bound,  i/o  bound,  etc.) 
and  for  variations  within  and  among  these  classes.  Even  given  that  the 
initial  system  models  may  be  developed  by  an  outside  group,  cooperation 
from  those  persons  most  intimately  involved  with  the  system  and  model 
updating,  at  least  at  times  of  system  configuration  modifications,  are 
essential  to  accurate  response  time  prediction. 

6 .2 .   Suggested  Further  Research 

There  are,  of  course,  many  other  areas  of  investigation 
not  directly  related  to  dynamic  response  time  monitors,  but  aimed  directly 
at  assisting  users  of  computer  networks,  that  need  to  be  explored.   Some 
of  these  areas  are  comparative  job  cost,  "bidding"  scheduling  disciplines, 
a  basic,  uniform  subset  of  time -sharing  system  commands  available  on  any 
network  system  and  the  "black  box"  approach  to  scheduling  in  which  the 
user  views  the  network  as  a  single  powerful  system.   If  we  agree  that 
"people  use  computers",  then  we  have  to  agree  to  serve  the  needs  of  the 
computing  community. 

Direct  extensions  of  this  research  require  the  cooperation  of 
all  HOST  facilities  to  gather  the  necessary  data  required  to  make  the 
monitor  universal  to  the  entire  network.  Even  the  extensive  measurements 
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collected  for  the  particular  five  systems  investigated  in  detail  are 
incomplete  in  that  they  do  not  conclusively  guarantee  response  time 
predictive  capability  for  all  classes  of  computing  applications.  A 
consistent,  uniform  response  time  measure  and  load  level  measure  must 
be  adopted  by  all  network  HOSTS.  Facilities  should  be  provided  for 
forcing  the  systems  into  saturation  so  that  system  behavior  can  be 
observed  under  all  loading  conditions  and  so  that  comparisons  of  systems 
can  be  made  more  conceptually  satisfying.  Fine  tuning  of  the  basic  models 
developed  in  this  research  must  be  done  for  various  kinds  of  computing 
applications  and  models  as  well  as  tables  and  descriptions  of  system 
loading  characteristics  must  be  continually  updated  so  as  to  credibly 
correspond  to  users'  actual  experience  with  a  system. 

A  further  extension  of  this  research  is  the  investigation  of 
comparative  system  costs  so  that  users  are  enabled  to  balance  their 
response  time  desires  with  their  budget  constraints. 

A  final  suggestion  for  future  research  which  may  be  of  particular 
significance  in  determining  the  viability  of  the  whole  computer  networking 
concept  is  to  determine  the  degree  to  which  users  at  various  sites  are 
motivated  to  exploit  the  resources  at  other  network  nodes,  given  that 
the  advantages  of  such  activities  are  made  readily  apparent  to  them. 
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APPENDIX  A 


Definitions  and  Abbreviations 


AMES-TSS 


Time  Sharing  System  created  by  IBM  and 
run  on  an  IBM  360/67  at  the  Nasa  Ames 
Research  Center,  Moffett  Field,  California, 
This  interactive  system  is  characterized 
by  a  table  driven  process  scheduler,  in 
which  the  frequency  and  duration  of 
processor  time  slices  awarded  to  processes 
is  determined  by  the  process'  paging 
behavior. 


BBN-TENEX 


A  time -sharing  system  run  on  a  PDP-10 
machine  at  Bolt,  Beranek  and  Newman, 
Incorporated  in  Cambridge,  Massachusetts. 
The  scheduler  is  characterized  by  five 
priority  queues  and  a  "balance  set"  control 
module  which  regulates  running  processes 
so  as  to  minimize  the  probability  of  an  idle 
CPU  due  to  too  frequent  page  faults. 


CANDE 
CCN-TSO 


See  UCSD -CANDE  below. 

Time  Sharing  Option  created  by  IBM  and  run 
on  an  IBM  360/91  a"t  the  Campus  Computing 
Network  on  the  University  of  California 
campus  in  Los  Angeles.  The  scheduler  is 
distinguished  by  its  binding  processes  to 
one  of  a  fixed  number  of  virtual  machines 
within  which  no  multiprogramming  occurs. 


FIFO 


MIT-MI  JLTICS 


A  scheduling  discipline  in  which  processes 
are  served  in  a  first-in,  first-out  order. 

A  time  sharing  system  run  on  a  Honeywell  6*+5 
at  the  Massachusetts  Institute  of  Technology 
in  Cambridge.  This  scheduler  is  characterized 
by  its  concept  of  a  set  of  "eligibles"  which 
consists  of  those  processes  having  the  highest 
dispatching  priority  that  can  simultaneously 
exist  in  core. 


MULTICS 


See  MIT-MULTICS  above. 
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OS/MVT 


0S/VS2 


An  IBM  Operating  System  in  which  a 
multiprogramming  environment  exists. 

An  IBM  Operating  System  characterized  by 
its  Virtual  Storage  memory  allocation 
scheme . 


Packet  Switching 


RR 


Store-and-forward  Network 


TENEX 
Thrashing 


TSO 


A  method  for  sending  transmissions  through 
a  communications  network  in  which  messages 
are  broken  down  into  smaller  "packets" 
of  information  to  be  transmitted  separately 
and  reassembled  by  the  receiver. 

A  scheduling  discipline  in  which  processes 
are  scheduled  Round  Robin;  that  is,  they 
each  receive  a  specified  amount  of  service 
and  then  are  returned  to  the  end  of  the 
service  queue  if  they  have  not  completed 
execution  in  the  specified  time. 

A  computer  network  in  which  messages  to  be 
transmitted  are  stored  in  each  node  along  the 
transmission  path  until  they  are  safely 
received  by  the  next  node  in  their  path. 

See  BBN-TENEX  above. 

A  state  occuring  in  paged  memory  systems  in 
which  too  many  different  working  sets  occupy 
main  memory  and  each  displaces  the  others 
pages  in  an  attempt  to  have  its  own  pages 
present . 

See  CCN-TSO  above. 


TSS 
UCLA 

UGSD-CANDE 


See  AMES-TSS  above. 

University  of  California  at  Los  Angeles 

A  time -sharing  system  run  on  a  Burroughs 
67OO  machine  at  the  University  of  California 
at  San  Diego.  The  scheduler  is  characterized, 
by  two  priority  queues,  with  a  high  priority 
queue  serving  burst-oriented  processes  and 
a  low  priority  queue  serving  compute  bound 
processes . 
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APPENDIX  B 


Benchmark  Jobs 


B.l.       MIT-MULTICS  Number  Cruncher 

REAL  CRL( 100, 100) , DATA ( 100, 100) , SUM(lOO) , SD (100),  OBS, TSUM, TSUMS 
INTEGER   I,J,K,L,M 
DATA  L/l00/,M/l00/ 
DO  10   1=1,  M 
DO  10  J=1,L 

10  DATA(l,j)=l./(3*I-3+J) 

CALL  C ORREL(CRL, DATA,  SUM,  SD,L,M) 

STOP 

END 


SUBROUTINE  CORREL(CRL,DATA, SUM, SD,L,M) 
INTEGER  L,M,  I,J,K 

REAL  CRL  (M,  M ) ,  DATA(  L, M ) ,  SUM(M) ,  SD  (M ) ,  OBS,  TSUM,  TSUMS 
OBS=M 

DO  100   1=1,  L 
TSUM=0 . 
TSUMS=0. 
DO  20  J=1,M 
TSUM=T  SUM+DATA ( J, I ) 
20  TSUMS=TSUMS+DATA(J, l)**2 

SUM(l)=TSUM 

sd(i)=sort(tsums-tsum*tsum/obs) 

100  CRL(I,I)=1. 

LML=L-1 

DO  150      1=1, LM1 

IP1=I+1 

DO  150     J=IP1,L 

TSUM=0 . 

DO  125  K=1,M 
125  TSUM=TSUM+DATA(K,I)*DATA(K,J) 

CRL    (l,j)=(TSUM-SUM(l)*SUM(j)/OBS)/(SD(l)*SD(j)) 
150  CRL(J, I)=CRL(I,J) 

RETURN 

KM) 
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B.2.        MIT-MULTICS  Bit   String  Manipulator 

C0NN100:    EROD; 

DCL(SYSIN, SYSFRINT)FILE; 

DCL (FOUND, GOAL, REALITY, LAST) BIT (10201) ALIGNED ; 

DCL  ( I,  J,  ITERATIONS)  FIXED  BIN; 

DCL   SEED  FIXED  BIN(lT); 

DCL  MULTIPLIER  FIXED  BIN; 

MULTIPLIER^  57; 

SEED =99; 

DO   1=1   TO  10201  BY  17; 

LF   SEED=0  THEN   SEED  =  MULTIPLIER; 

SEED=MOD  ( SEED*MULTIPLIER,  131072 ) ; 

SUBSTR (REALITY, I, 17)=BIT(SEED) ; 

END; 

SUBSTR (REALITY, 1, 101)= (100)"0"B; 

DO   1=102   TO  10201  BY  101; 

SUBSTR (REALITY, I, 1 )= "OnB ; 

end; 

GOAL, FOUND, LAST= "o"B ; 

SUBSTR (GOAL, 10102, 100 )=( 100 )"1"B; 

SUBSTR (FOUND, 103, 100)=SUBSTR (REALITY, 103, 100) ; 

ITERATI0NS=1; 

DO  "WHILE    ( (FOUND  t=IAST)    &(  (FOUND&GOAL)="0"B) )  ; 

LAST=FOUND ; 

ITERATI0NS=ITERATI0NS+1 ; 

SUBSTR  (FOUND,  102)  =  SUBSTR  (REALITY,  102  )&(F0TJND    |  SUBSTR  (FOUND,  101  )> 

SUBSTR  (FOUND,  102  )\|SUBSTR  (FOUND,  103  K|\  I  SUBSTR  (FOUND,  203 ) )  \ 
END; 
END   C0NN100; 


MIT-MULTICS   I/O  Bound 

FILFLG:    PROC ; 

DECLARE   I  FIXED  BIN(3l); 

DECLARE (NUMBERRECS   INIT(lOOO), 

RECLENGTH   INIT(250)) 

FIXED  BIN(15), 

FILEIN  FILE  RECORD, 

FILEOT  FILE  RECORD, 

1  RECORD  ALIGNED, 

2  WORTHLESSTEXT  CHAR (2 50)    INIT( (250) "X") ; 
OPEN  FILE  (FILEOT )   TITLE  (  "VFILE «-  TSTJKM")      OUTPUT ; 
DO   1=1  TO  NUMBERRECS; 
WRITE  FILE (FILEOT)  FROM(RECORD) ; 
END; 

CLOSE  FILE  (FILEOT); 

OPEN  FILE  (FILEIN)  TITLE  ( 'V  FILE  *-  TSTJKM")   INPUT; 
DO  1=1  TO  NUMBERRECS; 

READ  FILE (FILEIN) INTO (RECORD); 

END; 

CLOSE  FILE (FILEIN); 

END  FILFLG; 
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APPENDIX  C 
Relevant  Statistical  Data 

Comparison  of  residual  mean  squares  (RMS)  for  the  individual 
system  data  curve  fits  was  one  of  the  criteria  used  to  determine  a 
"best  fit"  to  the  response  time  data.   Table  C.l  contains  the  RMS  for 
the  quadratic,  cubic  and  exponential  curve  fits,  for  each  of  the  bench- 
mark jobs  run.   The  other  criteria  used  were  possibility  of  fit  (does 
the  regression  curve  indicate  the  response  time  is  negative  for  some  range 
of  the  data)  and  probability  of  fit  (does  the  regression  curve  indicate 
a  higher  response  time  for  a  lower  load  level  than  a  higher  one).  The 
final  choices  for  the  best  fit  curve  are  listed  in  Table  C.2.   The 
regression  sum  of  squares  to  total  sum  of  squares  ratio  given  in  the 
table  is  a  measure  of  how  well  the  regression  curve  explains  the  total 
variation  in  the  data.  A  ratio  of  1.0  would  indicate  a  perfectly  fit 
curve . 
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Table  C.l.   Residual  Mean  Square  (RMS)  Statistics 


Location 

Benchmark 

(RMS) 
Quadratic 

(RMS) 
Cubic 

(RMS) 
Exponential 

923.5^ 

AMES 

Wo.  Cruncher 

8^8.92 

788.29 

Bit  Manipul. 

1^5-66 

150.19 

1^5-95 

I/O  Bound 

985. h2 

1029.69 

952. Ok 

BBN 

No .  Cruncher 

lA9(io5) 

1.^7(105) 

1.35(105) 

CCN 

No .  Cruncher 

1217.02 

1882.58 

1079. ^3 

Bit  Manipul. 

839.89 

1591.57 

835.72 

I/O  Bound 

6811.62 

6725.53 

5^1.88 

MIT 

i 

No.  Cruncher 

7.^9(10^) 

7A9(10U) 

6.62(10^) 

Bit  Manipul. 

11.72 

12.35 

12.19 

I/O  Bound 

25.68 

23.19 

26  A3 

UCSD 

No.  Cruncher 

li+6l.07 

15^2.0^ 

1301.9 
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Table  C.2.   Individual  System  Best  Curve  Fit  Data 


Location 

Benchmark 

rss/tss* 

Type  of  Curve 
for  Best  Fit 

AMES 

No.  Cruncher 

.hk 

Exponential 

Bit  Manipul. 

•  59 

Quadratic 

I/O  Bound 

•67 

Cubic 

BBN 

No.  Cruncher 

.87 

Exponential 

CCN 

No.  Cruncher 

•59 

Exponential 

Bit  Manipul. 

.6k 

Exponential 

I/O  Bound 

•  53 

Exponential 

MIT 

No.  Cruncher 

•  37 

Exponential 

Bit  Manipul. 

.kk 

Exponential 

I/O  Bound 

•  59 

Exponential 

UCSD 

— _ 

No.  Cruncher 

.81 

Exponential 

^Regression  Sum  of  Squares/Total  Sum  of  Squares 
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